ML Ops Engineer
New
P
PragmatikeCloud Computing / AI
Fully remote (EMEA timezone), EMEA timezoneFull-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Languages
- Fluent English required
- Experience
- 4+ years
- Required Skills
- PythonTerraformHelmMLOpsDistributed Systems
Requirements
- 4+ years of experience in ML Ops, Platform Engineering, SRE, or similar infrastructure roles focused on ML systems
- Hands-on experience with model serving frameworks such as vLLM, TGI, Triton, or equivalent
- Strong background in container orchestration and operating GPU-based workloads in production
- Experience with MLOps tooling including model registries, experiment tracking, and automated deployment pipelines
- Proficiency in Python
- Proficiency in infrastructure-as-code tools (e.g., Terraform, Helm, or similar)
- Strong understanding of distributed systems, performance tuning, and production reliability engineering
- Ability to effectively use AI coding assistants to accelerate development and debugging workflows
- Ownership mindset with the ability to operate independently in a remote-first environment
Responsibilities
- Build and operate production-grade model serving infrastructure using frameworks such as vLLM, TGI, Triton, or equivalent
- Design and implement robust deployment pipelines with blue/green and canary rollout strategies for ML models
- Develop and maintain auto-scaling systems, multi-model serving architectures, and intelligent request routing layers
- Optimize GPU utilization, memory efficiency, network throughput, and model artifact storage performance
- Design observability systems for tracking inference latency, throughput, GPU usage, cost metrics, and system health
- Manage model registries and CI/CD pipelines enabling automated and reproducible model deployments
- Own the full lifecycle of ML systems from development through production, including operational support and on-call responsibilities
- Define engineering best practices and contribute to platform scalability in a fast-moving startup environment
View Full Description & ApplyYou'll be redirected to the employer's site