Staff Machine Learning Engineer, AI Serving

New

USFull-TimeStaff

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Experience: 7+ years
Required Skills: AWSPythonGCPKubernetesMachine LearningPyTorchGoTerraformDistributed Systems

7+ years of experience in Machine Learning Engineering, AI Platform Engineering, or large-scale distributed systems development.
Strong experience operating and scaling Kubernetes-based infrastructure in production environments.
Deep knowledge of ML serving systems, inference pipelines, and production-grade AI deployment.
Strong programming skills in Python and/or Go, with experience in building scalable backend or ML systems.
Hands-on experience with modern ML/AI frameworks and tooling such as PyTorch, Triton, vLLM, or similar technologies.
Experience with cloud platforms (AWS, GCP) and infrastructure tooling such as Terraform or equivalent.
Strong understanding of observability, monitoring, and performance tuning for real-time systems.
Ability to communicate complex technical concepts clearly to both technical and non-technical stakeholders.
Strong ownership mindset with a focus on scalability, reliability, and developer experience.

Lead the design, development, and maintenance of a large-scale ML inference platform supporting low-latency, high-throughput model serving for search, ranking, and generative AI workloads.
Architect and implement GPU-based serving systems capable of handling millions of queries per second with strong reliability and performance guarantees.
Build and optimize end-to-end inference pipelines, including routing, caching, batching, and feature processing systems.
Develop and maintain model export frameworks to convert trained models into optimized formats for efficient GPU inference.
Design and improve observability systems for real-time monitoring of model performance, system health, and feature behavior.
Lead efforts in benchmarking, performance tuning, and scalability improvements across multi-cluster cloud environments.
Collaborate with cross-functional ML, infrastructure, and product teams to support production deployment of large-scale ML and LLM systems.

View Full Description & ApplyYou'll be redirected to the employer's site