Model Serving Engineer

New

Fully remote, long-term position within the United States.Full-TimeSenior

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Experience: 6+ years
Required Skills: PythonFGPA ArchitectureKubernetesC++GoRustDistributed Systems

Requirements

Bachelor’s or Master’s degree in Computer Science or a related technical field.
6+ years of experience in distributed systems, infrastructure engineering, or ML platform engineering.
Strong proficiency in Python and a systems programming language such as Go, Rust, or C++.
Hands-on experience with large-scale model inference frameworks (e.g., vLLM, TensorRT-LLM, or similar).
Strong understanding of GPU architecture, memory management, and performance optimization techniques.
Experience with Kubernetes, cloud infrastructure, and autoscaling systems.
Expertise in observability tools including metrics, logging, and distributed tracing.
Strong background in performance engineering, low-latency systems, and capacity planning.
Excellent communication, incident response, and cross-functional collaboration skills.

Responsibilities

Design, build, and operate scalable model serving infrastructure for LLMs, vision models, and recommendation systems.
Optimize inference performance using techniques such as continuous batching, caching, request multiplexing, and GPU memory optimization.
Implement routing, rate limiting, and multi-tenant service policies to ensure reliability and fair resource usage across endpoints.
Develop autoscaling, capacity planning, and load balancing systems to maintain performance under varying workloads.
Build end-to-end observability systems, including metrics, logging, tracing, and performance monitoring for AI services.
Collaborate with ML and product teams to support model deployment, rollout strategies, and production integration.
Implement security, abuse detection, and API governance controls across model serving infrastructure.
Support incident response, debugging, and continuous reliability improvements for production AI systems.

View Full Description & ApplyYou'll be redirected to the employer's site

Similar Jobs

Staff Engineering Manager, Data Engineering

Working remotely within the United StatesFull-Time

140,400 - 372,300 USD per year

View Job

Software Engineer / Senior Software Engineer

Nava PBC

We have fully remote options if you reside in one of the following states: Alabama, Arizona, California, Colorado, Connecticut, DC, Delaware, Florida, Georgia, Illinois, Louisiana, Maine, Maryland, Massachusetts, Michigan, Minnesota, Missouri, Nevada, North Carolina, New Jersey, New York, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Carolina, Texas, Tennessee, Utah, Virginia, Washington, Wisconsin... This role requires you to work from the contiguous United States.Full-Time

135,900 - 153,000 USD per year

View Job

Engineering Manager, Data Engineering

Openly

Remote, United StatesFull-Time

176,000 - 264,000 USD per year

View Job