Staff Machine Learning Engineer, AI Serving

Remote - United StatesFull-TimeStaff
Salary253,300 - 354,600 USD per year
Apply NowOpens the employer's application page

Job Details

Experience
7+ years of experience in ML Engineering, AI Platform Engineering, or Cloud AI Deployment roles.
Required Skills
AWSPythonKubernetesMachine LearningPyTorchGoTerraform

Requirements

  • 7+ years of experience in ML Engineering, AI Platform Engineering, or Cloud AI Deployment roles.
  • Have experience operating orchestration systems such as Kubernetes at scale.
  • Deep experience with cloud-based technologies for supporting an ML platform, including tools like AWS, Google Cloud Storage, infrastructure-as-code (Terraform), and more.
  • Proficiency with the common programming languages and frameworks of ML, such as Go, Python, etc.
  • Excellent communication skills with the ability to articulate technical AI concepts to non-technical stakeholders.
  • Strong focus on scalability, reliability, performance, and ease of use. You are an undying advocate for platform users and have a deep intuition for the genAI product development lifecycle.
  • Strong knowledge of model serving, inference pipelines, monitoring, and observability for AI systems is a plus.
  • Strong proficiency in Python and deep experience with modern AI/ML frameworks (Triton, Dynamo, vLLM, Pytorch).

Responsibilities

  • Lead the end-to-end design, implementation, and maintenance of a highly available, low-latency GPU-based model serving system for search, ranking, and LLMs supporting Millions of QPS.
  • Design and develop ML and Generative AI systems in cloud-based production environments on Kubernetes at scale.
  • Rapidly develop prototypes and develop a high-performance feature hydration and processing system as a part of the inference stack - including routing, caching, and batching.
  • Lead a unified GPU model export framework to support converting trained models into optimized GPU inference models.
  • Strong understanding of real-time ML observability to track feature/model performance.
  • Experience working with LLM serving online at scale.
  • Built an E2E inference performance benchmarking framework.
  • Deep Understanding of multi-cluster compute environment and network topology that is specific to ML inference use cases.
View Full Description & ApplyYou'll be redirected to the employer's site
253,300 - 354,600 USD per year
Apply Now