Staff Machine Learning Engineer, AI Serving
Remote - United StatesFull-TimeStaff
Salary253,300 - 354,600 USD per year
Apply NowOpens the employer's application page
Job Details
- Experience
- 7+ years of experience in ML Engineering, AI Platform Engineering, or Cloud AI Deployment roles.
- Required Skills
- AWSPythonKubernetesMachine LearningPyTorchGoTerraform
Requirements
- 7+ years of experience in ML Engineering, AI Platform Engineering, or Cloud AI Deployment roles.
- Have experience operating orchestration systems such as Kubernetes at scale.
- Deep experience with cloud-based technologies for supporting an ML platform, including tools like AWS, Google Cloud Storage, infrastructure-as-code (Terraform), and more.
- Proficiency with the common programming languages and frameworks of ML, such as Go, Python, etc.
- Excellent communication skills with the ability to articulate technical AI concepts to non-technical stakeholders.
- Strong focus on scalability, reliability, performance, and ease of use. You are an undying advocate for platform users and have a deep intuition for the genAI product development lifecycle.
- Strong knowledge of model serving, inference pipelines, monitoring, and observability for AI systems is a plus.
- Strong proficiency in Python and deep experience with modern AI/ML frameworks (Triton, Dynamo, vLLM, Pytorch).
Responsibilities
- Lead the end-to-end design, implementation, and maintenance of a highly available, low-latency GPU-based model serving system for search, ranking, and LLMs supporting Millions of QPS.
- Design and develop ML and Generative AI systems in cloud-based production environments on Kubernetes at scale.
- Rapidly develop prototypes and develop a high-performance feature hydration and processing system as a part of the inference stack - including routing, caching, and batching.
- Lead a unified GPU model export framework to support converting trained models into optimized GPU inference models.
- Strong understanding of real-time ML observability to track feature/model performance.
- Experience working with LLM serving online at scale.
- Built an E2E inference performance benchmarking framework.
- Deep Understanding of multi-cluster compute environment and network topology that is specific to ML inference use cases.
View Full Description & ApplyYou'll be redirected to the employer's site