Staff Machine Learning Engineer, AI Serving

New
USFull-TimeStaff
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
7+ years
Required Skills
AWSPythonGCPKubernetesMachine LearningPyTorchGoTerraformDistributed Systems

Requirements

  • 7+ years of experience in Machine Learning Engineering, AI Platform Engineering, or large-scale distributed systems development.
  • Strong experience operating and scaling Kubernetes-based infrastructure in production environments.
  • Deep knowledge of ML serving systems, inference pipelines, and production-grade AI deployment.
  • Strong programming skills in Python and/or Go, with experience in building scalable backend or ML systems.
  • Hands-on experience with modern ML/AI frameworks and tooling such as PyTorch, Triton, vLLM, or similar technologies.
  • Experience with cloud platforms (AWS, GCP) and infrastructure tooling such as Terraform or equivalent.
  • Strong understanding of observability, monitoring, and performance tuning for real-time systems.
  • Ability to communicate complex technical concepts clearly to both technical and non-technical stakeholders.
  • Strong ownership mindset with a focus on scalability, reliability, and developer experience.

Responsibilities

  • Lead the design, development, and maintenance of a large-scale ML inference platform supporting low-latency, high-throughput model serving for search, ranking, and generative AI workloads.
  • Architect and implement GPU-based serving systems capable of handling millions of queries per second with strong reliability and performance guarantees.
  • Build and optimize end-to-end inference pipelines, including routing, caching, batching, and feature processing systems.
  • Develop and maintain model export frameworks to convert trained models into optimized formats for efficient GPU inference.
  • Design and improve observability systems for real-time monitoring of model performance, system health, and feature behavior.
  • Lead efforts in benchmarking, performance tuning, and scalability improvements across multi-cluster cloud environments.
  • Collaborate with cross-functional ML, infrastructure, and product teams to support production deployment of large-scale ML and LLM systems.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now