Apply

MLOps Engineer

Posted about 2 months agoInactiveViewed

View full description

💎 Seniority level: Middle, 3+ years

🔍 Industry: Healthcare

🏢 Company: Paradigm Health👥 51-100💰 Private almost 4 years agoHospitalHealth Care

🗣️ Languages: English

⏳ Experience: 3+ years

Requirements:
  • 3+ years of experience working as an engineer, data scientist, or in a similar role, contributing to production-level infrastructure and tools.
  • Hands-on experience in designing, building, and maintaining data pipelines, understanding how to achieve scalability, reliability, and maintainability.
  • Excellent communication skills, with the ability to distill complex technical concepts to essential elements for both technical and non-technical audiences, with great enthusiasm.
  • Comfortable with ambiguity and capable of thriving in a mission-driven, fast-paced startup environment where innovation and adaptability are key.
Responsibilities:
  • Support the development of new product features and ML solutions, collecting technical requirements, evaluating emerging technologies, and when necessary, implementing minimal internal frameworks and libraries for codifying our task domain in composable units.
  • Develop and maintain CI/CD pipelines, ensuring reliable integration and deployment of machine learning models, shared libraries, and data processing tasks.
  • Implement and manage cloud-based solutions on AWS that support scalable data storage, compute resources, and orchestration of data science tasks.
  • Ensure compliance with data security, governance, and regulatory standards in collaboration with Data Engineering, Information Security and Developer Operations teams.
  • Monitor and troubleshoot platform-related issues, optimizing for data quality, throughput, service availability, resource allocation, and cost.
  • Support and train data scientists and machine learning engineers on platform tools and best practices, fostering collaboration and continuous learning.
Apply

Related Jobs

Apply

🏢 Company: Weekday AI👥 1-10💰 almost 4 years agoE-CommerceFashion

Posted 1 day ago
Apply
Apply
🔥 MLOps Engineer
Posted 8 days ago

🧭 Full-Time

💸 50000.0 - 90000.0 PHP per month

🔍 Healthcare

🏢 Company: Theoria Medical👥 1001-5000Electronic Health Record (EHR)HospitalHealth CareHome Health Care

  • 2+ years of experience in DevOps, Site Reliability Engineering, or a related role.
  • Proficiency in cloud platforms (e.g., AWS, Azure, GCP), containerization (e.g., Docker, Kubernetes), and infrastructure-as-code tools (e.g., Terraform, Ansible).
  • Strong scripting and programming skills in languages such as Python, Bash, or Go.
  • Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
  • Proficiency in CI/CD tools (e.g., Jenkins, GitLab CI, CircleCI).
  • Strong experience with version control systems (e.g., Git).
  • Excellent problem-solving and troubleshooting abilities, with a proactive approach to identifying and resolving issues.
  • Strong verbal and written communication skills, with the ability to work collaboratively with cross-functional teams.
  • Ability to thrive in a fast-paced, dynamic environment and manage multiple priorities effectively.
  • Design, implement, and manage scalable and reliable infrastructure solutions to support Theoria’s applications and services.
  • Develop and maintain automation scripts and tools to streamline deployment, monitoring, and infrastructure management processes.
  • Implement and maintain comprehensive monitoring and alerting systems to ensure the health and performance of production environments.
  • Respond to and resolve production incidents, performing root cause analysis and implementing preventive measures to avoid recurrence.
  • Design, implement, and optimize continuous integration and continuous deployment (CI/CD) pipelines to ensure fast and reliable software delivery.
  • Work closely with development, QA, and operations teams to ensure seamless integration and deployment of applications.
  • Ensure the security and compliance of infrastructure and applications by implementing best practices and conducting regular audits.
  • Monitor and analyze system performance and capacity, and plan for future scalability and growth.
  • Create and maintain detailed documentation of infrastructure, processes, and procedures to ensure knowledge sharing and continuity.
  • Identify areas for improvement in the deployment and operations processes, and drive initiatives to enhance efficiency and reliability.
Posted 8 days ago
Apply
Apply

📍 India, Ukraine, United Arab Emirates, Saudi Arabia, Poland

🏢 Company: Xenon7

  • 3+ years in a similar role with proven expertise in Databricks, AWS, and preferably some exposure to Azure.
  • Strong background in MLOps, DevOps, and cloud (desirable if in a similar industry)
  • Knowledge of AWS AI Services
  • Manage and optimize Databricks environments, ensuring high availability, performance, and security.
  • Implement and maintain Databricks on serverless architectures, ensuring seamless CI/CD pipelines and robust integration with AWS services.
  • Develop and enforce best practices for machine learning lifecycle management using Databricks.
  • Collaborate with data scientists and developers to automate and streamline our AI model development.
  • Leverage a broad range of AWS services and maintain familiarity with Azure to ensure cross-compatibility and optimal performance of our platforms.
  • Manage Kubernetes namespace-level operations within AWS EKS, including application deployment and environment configuration.

AWSPythonAWS EKSCloud ComputingKubernetesMachine LearningAzureServerlessCI/CDDevOps

Posted 8 days ago
Apply
Apply

📍 Canada

  • Proven track record in designing and implementing cost-effective and scalable ML inference systems.
  • Hands-on experience with leading deep learning frameworks such as TensorFlow, Keras, or Spark MLlib.
  • Solid foundation in machine learning algorithms, natural language processing, and statistical modeling.
  • Strong grasp of fundamental computer science concepts including algorithms, distributed systems, data structures, and database management.
  • Proficiency and recent experience in Java is required (Must have)
  • Ability to tackle complex challenges and devise effective solutions. Use critical thinking to approach problems from various angles and propose innovative solutions.
  • Worked effectively in a remote setting, maintaining strong written and verbal communication skills. Collaborate with team members and stakeholders, ensuring clear understanding of technical requirements and project goals.
  • Proven experience in Apache Hadoop ecosystem (Oozie, Pig, Hive, Map Reduce).
  • Expertise in public cloud services, particularly in GCP and Vertex AI.
  • Proven expertise in applying model optimization techniques (distillation, quantization, hardware acceleration) to production environments.
  • Proficiency and recent experience in Java is required (Must have)
  • In-depth understanding of LLM architectures, parameter scaling, and deployment trade-offs.
  • Architect and optimize our existing data infrastructure to support cutting-edge machine learning and deep learning models.
  • Collaborate closely with cross-functional teams to translate business objectives into robust engineering solutions.
  • Own the end-to-end development and operation of high-performance, cost-effective inference systems for a diverse range of models, including state-of-the-art LLMs.
  • Provide technical leadership and mentorship to foster a high-performing engineering team.

PythonApache HadoopGCPJavaKerasKubernetesMachine LearningMLFlowAlgorithmsData StructuresSparkTensorflowCI/CDLinuxDevOps

Posted 8 days ago
Apply
Apply

  • Proven track record in designing and implementing cost-effective and scalable ML inference systems.
  • Hands-on experience with leading deep learning frameworks such as TensorFlow, Keras, or Spark MLlib.
  • Solid foundation in machine learning algorithms, natural language processing, and statistical modeling.
  • Strong grasp of fundamental computer science concepts including algorithms, distributed systems, data structures, and database management.
  • Ability to tackle complex challenges and devise effective solutions. Use critical thinking to approach problems from various angles and propose innovative solutions.
  • Worked effectively in a remote setting, maintaining strong written and verbal communication skills. Collaborate with team members and stakeholders, ensuring clear understanding of technical requirements and project goals.
  • Proven experience in Apache Hadoop ecosystem (Oozie, Pig, Hive, Map Reduce).
  • Expertise in public cloud services, particularly in GCP and Vertex AI.
  • Proven expertise in applying model optimization techniques (distillation, quantization, hardware acceleration) to production environments.
  • Proficiency and recent experience in Java is required (Must have)
  • In-depth understanding of LLM architectures, parameter scaling, and deployment trade-offs.
  • Architect and optimize our existing data infrastructure to support cutting-edge machine learning and deep learning models.
  • Collaborate closely with cross-functional teams to translate business objectives into robust engineering solutions.
  • Own the end-to-end development and operation of high-performance, cost-effective inference systems for a diverse range of models, including state-of-the-art LLMs.
  • Provide technical leadership and mentorship to foster a high-performing engineering team.
Posted 8 days ago
Apply
Apply
🔥 MLOps Engineer
Posted 10 days ago

📍 Canada (British Columbia and Ontario), UK (London), India (Gujarat, Maharashtra, and Bengaluru)

🧭 Full-Time

🔍 Software Development

🏢 Company: Loopio Inc.

  • 2+ years of experience working in ML operations, ML engineering, or related infrastructure roles.
  • Comfort working with AWS (or similar cloud environments), Docker, and Kubernetes.
  • Strong Python development skills, with a solid understanding of software engineering practices (testing, logging, version control, code review).
  • Experience with tools such as MLflow, SageMaker, TensorFlow Serving, or TorchServe.
  • Build and maintain robust ML pipelines for training, evaluation, and deployment.
  • Package and deploy models into production environments using tools like Docker, Kubernetes, and SageMaker.
  • Help implement systems to monitor model health in production, detect drift, and log predictions.
  • Work within our CI/CD systems to support model validation, promotion, and rollback.
  • Partner with ML Engineers and Data Scientists to bring ML systems into production.
  • Partner with Infra and DevOps teams to understand their tooling deeply in order to help implement ML systems and related cloud architecture.

AWSDockerPythonSQLCloud ComputingGitKubeflowKubernetesMachine LearningMLFlowAirflowgRPCREST APITensorflowCI/CDRESTful APIsLinuxSoftware Engineering

Posted 10 days ago
Apply
Apply

📍 UK, Poland

🔍 Fintech

🏢 Company: Cleo👥 501-1000E-CommerceRetailFashionJewelry

  • Strong knowledge of data system design; ability to break down problems and propose effective solutions.
  • Proficiency in Python, with a strong understanding of software engineering best practices (testing, automation, code quality).
  • Experience with containerisation and orchestration (Docker and Kubernetes).
  • Infrastructure as Code (Terraform or similar).
  • Experience with at least one distributed data-processing framework (Spark, Flink, Kafka, etc.).
  • Familiarity with different storage solutions (e.g., OLTP, OLAP, NoSQL, object storage) and their trade-offs.
  • Product mindset and ability to link technical decisions to business impact.
  • Excellent cross-functional communication—able to partner with data scientists, software engineers, and product managers.
  • Collaborate closely with product teams to implement robust, scalable data pipelines and ML workflows.
  • Guide teams in adopting best practices around data engineering, infrastructure management, and MLOps.
  • Surface practical insights from product teams to inform improvements in our internal Data Platform.
  • Contribute actively to enhancing our data and ML infrastructure—focusing on usability, efficiency, reliability, and cost-effectiveness.
  • Mentor and support engineers and data scientists in data engineering and MLOps best practices.

DockerPythonSQLKubernetesMachine LearningData engineeringSparkCI/CDRESTful APIsTerraform

Posted 15 days ago
Apply
Apply
🔥 MLOps Engineer
Posted 21 days ago

📍 Germany

🧭 Full-Time

🔍 Software Development

🏢 Company: Cognigy👥 101-250💰 $100,000,000 Series C 12 months agoIT InfrastructureSales AutomationArtificial Intelligence (AI)IT ManagementSaaSGenerative AIInformation TechnologySmall and Medium BusinessesChatbotSoftware

  • Hands-on experience running production ML or LLM workloads in Kubernetes
  • Familiarity with distributed ML frameworks such as KubeRay, Ray Serve, or similar
  • Deep understanding of Kubernetes internals, especially GPU scheduling, autoscaling, and multi-tenant environments
  • Proficiency with CI/CD systems for ML models, and versioned deployment strategies
  • Strong experience with cloud platforms (AWS, GCP, or Azure), networking, and security best practices
  • Skilled in monitoring and observability for ML workloads (e.g., Prometheus, Grafana)
  • Passion for automation, performance tuning, and cost optimization for LLM workloads
  • Build & Operate LLM Infrastructure – Design and maintain scalable LLM-serving systems using Kubernetes and KubeRay.
  • Automate & Optimize – Automate deployments, rollbacks, and scaling of LLMs while optimizing resource usage and performance.
  • Enhance Observability – Ensure robust monitoring, logging, and alerting for LLM operations (Prometheus, Grafana, etc.).
  • Support AI Teams – Empower ML and product engineers with self-service pipelines and scalable infrastructure.
  • Prioritize Security – Enforce secure deployments, compliance practices, and robust incident response strategies.
  • Improve Documentation – Create and maintain technical documentation to streamline knowledge sharing and onboarding.
  • Drive Innovation – Evaluate, adopt, and integrate the latest MLOps and LLM-serving technologies.
  • Reduce SRE Toil – Eliminate repetitive tasks and improve operational efficiency across the platform.
Posted 21 days ago
Apply
Apply
🔥 MLOps Engineer
Posted 30 days ago

🔍 HealthTech

🏢 Company: Idoven

  • A strong passion for building robust and scalable ML platforms.
  • A solid understanding of optimization techniques, multithreading, and distributed system concepts.
  • A firm foundation in computer science principles, including data structures, algorithms, and algorithm complexity analysis.
  • Familiarity with machine learning frameworks such as TensorFlow or PyTorch.
  • Experience with experiment tracking and model management tools (e.g., MLflow, TensorBoard).
  • Experience with containerization technologies (Docker, Kubernetes) and version control systems (e.g., GitHub).
  • Excellent problem-solving, communication, and collaboration skills.
  • Ability to work both independently and as part of a team.
  • Develop, and maintain the tools and infrastructure that support our ML model training, experimentation, and deployment workflows.
  • Develop systems for efficient access to and management of large datasets.
  • Create solutions for optimizing GPU utilization and resource allocation.
  • Integrate and maintain experiment tracking and monitoring tools (e.g., MLflow, TensorBoard).
  • Develop and implement the processes for deploying ML models to production environments.
  • Collaborate closely with MLOps Engineers to understand their needs and provide effective solutions.
  • Troubleshoot and resolve issues related to the ML platform.
  • Stay current with the latest advancements in ML platform technologies and best practices.
Posted 30 days ago
Apply
Apply

🧭 Full-Time

🔍 Software Development

🏢 Company: Tripadvisor👥 1001-5000💰 $300,000,000 Post-IPO Equity about 4 years ago🫂 Last layoff over 1 year agoInternetHospitalityInformation ServicesE-CommerceRestaurantsVacation RentalHotelTravelSocial Media

  • At least 5 years’ experience of commercial software development.
  • Experience with Python, Java, Docker, Kubernetes, Argo, Spark and AWS cloud services a plus.
  • Exposure to Machine Learning practices a plus.
  • Develop across our evolving technology stack - we’re using Python, Java, Kubernetes, Apache Spark, Postgres, ArgoCD, Argo Workflow, Seldon, MLFlow and more.
  • Take responsibility for all aspects of software engineering, from design to implementation, QA and maintenance.
  • Collaborate closely with data science teams to define feature specifications and develop high quality deliverables for our customers.
Posted about 1 month ago
Apply

Related Articles

Posted about 1 month ago

How to Overcome Burnout While Working Remotely: Practical Strategies for Recovery

Burnout is a silent epidemic among remote workers. The blurred lines between work and home life, coupled with the pressure to always be “on,” can leave even the most dedicated professionals feeling drained. But burnout doesn’t have to define your remote work experience. With the right strategies, you can recover, recharge, and prevent future episodes. Here’s how.



Posted 5 days ago

Top 10 Skills to Become a Successful Remote Worker by 2025

Remote work is here to stay, and by 2025, the competition for remote jobs will be tougher than ever. To stand out, you need more than just basic skills. Employers want people who can adapt, communicate well, and stay productive without constant supervision. Here’s a simple guide to the top 10 skills that will make you a top candidate for remote jobs in the near future.

Posted 9 months ago

Google is gearing up to expand its remote job listings, promising more opportunities across various departments and regions. Find out how this move can benefit job seekers and impact the market.

Posted 10 months ago

Read about the recent updates in remote work policies by major companies, the latest tools enhancing remote work productivity, and predictive statistics for remote work in 2024.

Posted 10 months ago

In-depth analysis of the tech layoffs in 2024, covering the reasons behind the layoffs, comparisons to previous years, immediate impacts, statistics, and the influence on the remote job market. Discover how startups and large tech companies are adapting, and learn strategies for navigating the new dynamics of the remote job market.