MLOps Engineer

I
InteticsTechnology
Ukraine. Armenia. Georgia. Moldova. TurkeyFull-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Required Skills
AWSDockerKubeflowKubernetesMLFlowGrafanaPrometheusTerraformDatabricks

Requirements

  • Strong hands-on experience with AWS architecture, including security best practices, IAM, networking, and cost optimization
  • Proficiency with Databricks: MLflow, Workflows, Feature Store, cluster management, Unity Catalog
  • Experience with cloud-managed ML platforms such as AWS SageMaker or Google Vertex AI
  • Expert knowledge of Terraform / Terragrunt for multi-cloud infrastructure provisioning and automation
  • Deep expertise in Kubernetes, including autoscaling, GPU workloads, networking policies, and cluster optimization
  • Practical experience with observability stacks such as Prometheus, Grafana, Loki, ELK
  • Strong understanding of GitOps workflows and CI/CD tools (e.g., ArgoCD, FluxCD)
  • Solid knowledge of Docker security, container hardening, and secure container orchestration
  • Advanced experience in MLOps practices for continuous training (CT), CI/CD for ML models, and automated deployment
  • Familiarity with ML pipeline orchestration tools such as Kubeflow or Argo Workflows
  • Experience with LLMOps, including frameworks such as Langfuse, ollama, vLLM, and supporting large-scale inference
  • Ability to contribute to architecture design, set platform standards, and mentor MLOps or ML engineers

Responsibilities

  • Design and implement scalable, secure, and cost-efficient MLOps solutions leveraging AWS and Databricks
  • Automate ML deployment pipelines, reducing manual intervention and operational overhead
  • Collaborate closely with data scientists to ensure solutions align with established MLOps architecture, best practices, and platform standards
  • Integrate security controls and compliance requirements throughout the entire machine learning lifecycle
  • Own and manage incidents end-to-end, from root cause analysis to prevention of future occurrences
  • Contribute to software system architecture and the design of platform-level components
  • Build and optimize ML training, retraining, and inference pipelines, ensuring reliability and scalability
  • Enhance observability with metrics, logging, tracing, and dashboards to ensure system visibility and performance
  • Drive best practices in infrastructure automation, CI/CD, and cloud resource management across ML teams
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now