Staff Machine Learning Engineer, AI Serving

Remote - United StatesFull-TimeStaff

Salary253,300 - 354,600 USD per year

Apply NowOpens the employer's application page

Job Details

Experience: 7+ years of experience in ML Engineering, AI Platform Engineering, or Cloud AI Deployment roles.
Required Skills: AWSPythonKubernetesMachine LearningPyTorchGoTerraform

7+ years of experience in ML Engineering, AI Platform Engineering, or Cloud AI Deployment roles.
Have experience operating orchestration systems such as Kubernetes at scale.
Deep experience with cloud-based technologies for supporting an ML platform, including tools like AWS, Google Cloud Storage, infrastructure-as-code (Terraform), and more.
Proficiency with the common programming languages and frameworks of ML, such as Go, Python, etc.
Excellent communication skills with the ability to articulate technical AI concepts to non-technical stakeholders.
Strong focus on scalability, reliability, performance, and ease of use. You are an undying advocate for platform users and have a deep intuition for the genAI product development lifecycle.
Strong knowledge of model serving, inference pipelines, monitoring, and observability for AI systems is a plus.
Strong proficiency in Python and deep experience with modern AI/ML frameworks (Triton, Dynamo, vLLM, Pytorch).

Lead the end-to-end design, implementation, and maintenance of a highly available, low-latency GPU-based model serving system for search, ranking, and LLMs supporting Millions of QPS.
Design and develop ML and Generative AI systems in cloud-based production environments on Kubernetes at scale.
Rapidly develop prototypes and develop a high-performance feature hydration and processing system as a part of the inference stack - including routing, caching, and batching.
Lead a unified GPU model export framework to support converting trained models into optimized GPU inference models.
Strong understanding of real-time ML observability to track feature/model performance.
Experience working with LLM serving online at scale.
Built an E2E inference performance benchmarking framework.
Deep Understanding of multi-cluster compute environment and network topology that is specific to ML inference use cases.

View Full Description & ApplyYou'll be redirected to the employer's site