Senior Machine Learning Engineer
Remote-first flexibility within the United StatesFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 5–8+ years
- Required Skills
- PythonCloud ComputingMachine LearningLLMDistributed Systems
Requirements
- 5–8+ years of experience in ML engineering, software engineering, or platform/infrastructure roles with ownership of production ML systems
- Hands-on experience operating LLM serving frameworks such as vLLM, TGI, TensorRT-LLM, or SGLang in real production environments
- Strong Python skills and solid understanding of distributed systems and backend engineering principles
- Experience with cloud platforms (AWS, GCP, or Azure) and ML lifecycle tooling, including model registries and deployment systems
- Deep understanding of inference optimization concepts such as KV caching, batching strategies, GPU memory behavior, and latency bottlenecks
- Experience supporting heterogeneous ML workloads including LLMs, embeddings, and extraction models
- Strong ability to balance latency, throughput, reliability, and infrastructure cost trade-offs
- Experience working in fast-paced, high-growth environments with evolving technical requirements
- Excellent problem-solving, communication, and collaboration skills across technical and non-technical teams
Responsibilities
- Own and evolve a multi-engine inference platform supporting LLMs, embedding models, and other ML workloads in production environments
- Build and maintain production-grade ML serving pipelines, from model packaging and deployment to monitoring and lifecycle management
- Define and enforce SLAs for latency, throughput, availability, GPU utilization, and token-level performance metrics such as TTFT and ITL
- Design and implement model versioning, rollout, rollback, and reproducibility strategies for safe and scalable deployments
- Develop observability, monitoring, alerting, and debugging tools for production inference systems
- Optimize inference performance through batching strategies, GPU utilization, quantization, and hardware-aware system design
- Ensure secure, scalable, and cost-efficient ML serving infrastructure across cloud environments
- Partner cross-functionally with ML, data, product, and DevOps teams to translate research into production-ready systems
View Full Description & ApplyYou'll be redirected to the employer's site