ML Ops Engineer

New
P
PragmatikeCloud Computing / AI
Fully remote (EMEA timezone), EMEA timezoneFull-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Languages
Fluent English required
Experience
4+ years
Required Skills
PythonTerraformHelmMLOpsDistributed Systems

Requirements

  • 4+ years of experience in ML Ops, Platform Engineering, SRE, or similar infrastructure roles focused on ML systems
  • Hands-on experience with model serving frameworks such as vLLM, TGI, Triton, or equivalent
  • Strong background in container orchestration and operating GPU-based workloads in production
  • Experience with MLOps tooling including model registries, experiment tracking, and automated deployment pipelines
  • Proficiency in Python
  • Proficiency in infrastructure-as-code tools (e.g., Terraform, Helm, or similar)
  • Strong understanding of distributed systems, performance tuning, and production reliability engineering
  • Ability to effectively use AI coding assistants to accelerate development and debugging workflows
  • Ownership mindset with the ability to operate independently in a remote-first environment

Responsibilities

  • Build and operate production-grade model serving infrastructure using frameworks such as vLLM, TGI, Triton, or equivalent
  • Design and implement robust deployment pipelines with blue/green and canary rollout strategies for ML models
  • Develop and maintain auto-scaling systems, multi-model serving architectures, and intelligent request routing layers
  • Optimize GPU utilization, memory efficiency, network throughput, and model artifact storage performance
  • Design observability systems for tracking inference latency, throughput, GPU usage, cost metrics, and system health
  • Manage model registries and CI/CD pipelines enabling automated and reproducible model deployments
  • Own the full lifecycle of ML systems from development through production, including operational support and on-call responsibilities
  • Define engineering best practices and contribute to platform scalability in a fast-moving startup environment
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now