MLOps Engineer

InteticsTechnology

Ukraine. Armenia. Georgia. Moldova. TurkeyFull-TimeMiddle

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Required Skills: AWSDockerKubeflowKubernetesMLFlowGrafanaPrometheusTerraformDatabricks

Strong hands-on experience with AWS architecture, including security best practices, IAM, networking, and cost optimization
Proficiency with Databricks: MLflow, Workflows, Feature Store, cluster management, Unity Catalog
Experience with cloud-managed ML platforms such as AWS SageMaker or Google Vertex AI
Expert knowledge of Terraform / Terragrunt for multi-cloud infrastructure provisioning and automation
Deep expertise in Kubernetes, including autoscaling, GPU workloads, networking policies, and cluster optimization
Practical experience with observability stacks such as Prometheus, Grafana, Loki, ELK
Strong understanding of GitOps workflows and CI/CD tools (e.g., ArgoCD, FluxCD)
Solid knowledge of Docker security, container hardening, and secure container orchestration
Advanced experience in MLOps practices for continuous training (CT), CI/CD for ML models, and automated deployment
Familiarity with ML pipeline orchestration tools such as Kubeflow or Argo Workflows
Experience with LLMOps, including frameworks such as Langfuse, ollama, vLLM, and supporting large-scale inference
Ability to contribute to architecture design, set platform standards, and mentor MLOps or ML engineers

Design and implement scalable, secure, and cost-efficient MLOps solutions leveraging AWS and Databricks
Automate ML deployment pipelines, reducing manual intervention and operational overhead
Collaborate closely with data scientists to ensure solutions align with established MLOps architecture, best practices, and platform standards
Integrate security controls and compliance requirements throughout the entire machine learning lifecycle
Own and manage incidents end-to-end, from root cause analysis to prevention of future occurrences
Contribute to software system architecture and the design of platform-level components
Build and optimize ML training, retraining, and inference pipelines, ensuring reliability and scalability
Enhance observability with metrics, logging, tracing, and dashboards to ensure system visibility and performance
Drive best practices in infrastructure automation, CI/CD, and cloud resource management across ML teams

View Full Description & ApplyYou'll be redirected to the employer's site