Senior Machine Learning Engineer

Remote-first flexibility within the United StatesFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
5–8+ years
Required Skills
PythonCloud ComputingMachine LearningLLMDistributed Systems

Requirements

  • 5–8+ years of experience in ML engineering, software engineering, or platform/infrastructure roles with ownership of production ML systems
  • Hands-on experience operating LLM serving frameworks such as vLLM, TGI, TensorRT-LLM, or SGLang in real production environments
  • Strong Python skills and solid understanding of distributed systems and backend engineering principles
  • Experience with cloud platforms (AWS, GCP, or Azure) and ML lifecycle tooling, including model registries and deployment systems
  • Deep understanding of inference optimization concepts such as KV caching, batching strategies, GPU memory behavior, and latency bottlenecks
  • Experience supporting heterogeneous ML workloads including LLMs, embeddings, and extraction models
  • Strong ability to balance latency, throughput, reliability, and infrastructure cost trade-offs
  • Experience working in fast-paced, high-growth environments with evolving technical requirements
  • Excellent problem-solving, communication, and collaboration skills across technical and non-technical teams

Responsibilities

  • Own and evolve a multi-engine inference platform supporting LLMs, embedding models, and other ML workloads in production environments
  • Build and maintain production-grade ML serving pipelines, from model packaging and deployment to monitoring and lifecycle management
  • Define and enforce SLAs for latency, throughput, availability, GPU utilization, and token-level performance metrics such as TTFT and ITL
  • Design and implement model versioning, rollout, rollback, and reproducibility strategies for safe and scalable deployments
  • Develop observability, monitoring, alerting, and debugging tools for production inference systems
  • Optimize inference performance through batching strategies, GPU utilization, quantization, and hardware-aware system design
  • Ensure secure, scalable, and cost-efficient ML serving infrastructure across cloud environments
  • Partner cross-functionally with ML, data, product, and DevOps teams to translate research into production-ready systems
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now