Senior Machine Learning Engineer

Remote-first flexibility within the United StatesFull-TimeSenior

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Experience: 5–8+ years
Required Skills: PythonCloud ComputingMachine LearningLLMDistributed Systems

5–8+ years of experience in ML engineering, software engineering, or platform/infrastructure roles with ownership of production ML systems
Hands-on experience operating LLM serving frameworks such as vLLM, TGI, TensorRT-LLM, or SGLang in real production environments
Strong Python skills and solid understanding of distributed systems and backend engineering principles
Experience with cloud platforms (AWS, GCP, or Azure) and ML lifecycle tooling, including model registries and deployment systems
Deep understanding of inference optimization concepts such as KV caching, batching strategies, GPU memory behavior, and latency bottlenecks
Experience supporting heterogeneous ML workloads including LLMs, embeddings, and extraction models
Strong ability to balance latency, throughput, reliability, and infrastructure cost trade-offs
Experience working in fast-paced, high-growth environments with evolving technical requirements
Excellent problem-solving, communication, and collaboration skills across technical and non-technical teams

Own and evolve a multi-engine inference platform supporting LLMs, embedding models, and other ML workloads in production environments
Build and maintain production-grade ML serving pipelines, from model packaging and deployment to monitoring and lifecycle management
Define and enforce SLAs for latency, throughput, availability, GPU utilization, and token-level performance metrics such as TTFT and ITL
Design and implement model versioning, rollout, rollback, and reproducibility strategies for safe and scalable deployments
Develop observability, monitoring, alerting, and debugging tools for production inference systems
Optimize inference performance through batching strategies, GPU utilization, quantization, and hardware-aware system design
Ensure secure, scalable, and cost-efficient ML serving infrastructure across cloud environments
Partner cross-functionally with ML, data, product, and DevOps teams to translate research into production-ready systems

View Full Description & ApplyYou'll be redirected to the employer's site