Model Serving Engineer

New
Fully remote, long-term position within the United States.Full-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
6+ years
Required Skills
PythonFGPA ArchitectureKubernetesC++GoRustDistributed Systems

Requirements

  • Bachelor’s or Master’s degree in Computer Science or a related technical field.
  • 6+ years of experience in distributed systems, infrastructure engineering, or ML platform engineering.
  • Strong proficiency in Python and a systems programming language such as Go, Rust, or C++.
  • Hands-on experience with large-scale model inference frameworks (e.g., vLLM, TensorRT-LLM, or similar).
  • Strong understanding of GPU architecture, memory management, and performance optimization techniques.
  • Experience with Kubernetes, cloud infrastructure, and autoscaling systems.
  • Expertise in observability tools including metrics, logging, and distributed tracing.
  • Strong background in performance engineering, low-latency systems, and capacity planning.
  • Excellent communication, incident response, and cross-functional collaboration skills.

Responsibilities

  • Design, build, and operate scalable model serving infrastructure for LLMs, vision models, and recommendation systems.
  • Optimize inference performance using techniques such as continuous batching, caching, request multiplexing, and GPU memory optimization.
  • Implement routing, rate limiting, and multi-tenant service policies to ensure reliability and fair resource usage across endpoints.
  • Develop autoscaling, capacity planning, and load balancing systems to maintain performance under varying workloads.
  • Build end-to-end observability systems, including metrics, logging, tracing, and performance monitoring for AI services.
  • Collaborate with ML and product teams to support model deployment, rollout strategies, and production integration.
  • Implement security, abuse detection, and API governance controls across model serving infrastructure.
  • Support incident response, debugging, and continuous reliability improvements for production AI systems.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now