Model Serving Engineer
New
Fully remote, long-term position within the United States.Full-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 6+ years
- Required Skills
- PythonFGPA ArchitectureKubernetesC++GoRustDistributed Systems
Requirements
- Bachelor’s or Master’s degree in Computer Science or a related technical field.
- 6+ years of experience in distributed systems, infrastructure engineering, or ML platform engineering.
- Strong proficiency in Python and a systems programming language such as Go, Rust, or C++.
- Hands-on experience with large-scale model inference frameworks (e.g., vLLM, TensorRT-LLM, or similar).
- Strong understanding of GPU architecture, memory management, and performance optimization techniques.
- Experience with Kubernetes, cloud infrastructure, and autoscaling systems.
- Expertise in observability tools including metrics, logging, and distributed tracing.
- Strong background in performance engineering, low-latency systems, and capacity planning.
- Excellent communication, incident response, and cross-functional collaboration skills.
Responsibilities
- Design, build, and operate scalable model serving infrastructure for LLMs, vision models, and recommendation systems.
- Optimize inference performance using techniques such as continuous batching, caching, request multiplexing, and GPU memory optimization.
- Implement routing, rate limiting, and multi-tenant service policies to ensure reliability and fair resource usage across endpoints.
- Develop autoscaling, capacity planning, and load balancing systems to maintain performance under varying workloads.
- Build end-to-end observability systems, including metrics, logging, tracing, and performance monitoring for AI services.
- Collaborate with ML and product teams to support model deployment, rollout strategies, and production integration.
- Implement security, abuse detection, and API governance controls across model serving infrastructure.
- Support incident response, debugging, and continuous reliability improvements for production AI systems.
View Full Description & ApplyYou'll be redirected to the employer's site