Senior ML Operations (MLOps) Engineer
New
Based in United StatesFull-TimeSenior
SalaryCompetitive compensation with meaningful equity participation
Apply NowOpens the employer's application page
Job Details
- Experience
- 5+ years
- Required Skills
- AWSPythonPyTorchTensorflowCI/CDMLOpsDistributed Systems
Requirements
- 5+ years of software engineering experience with a focus on ML infrastructure, distributed systems, or large-scale data processing
- Strong proficiency in Python and ML frameworks such as PyTorch, TensorFlow, or equivalent
- Hands-on experience with MLOps workflows, including model training pipelines, orchestration, and CI/CD deployment systems
- Proven track record of deploying ML models into production at scale with monitoring and feedback systems
- Strong experience with cloud platforms (AWS preferred), including services for compute, storage, and observability
- Familiarity with distributed systems, streaming data, and large-scale data processing architectures
- Strong understanding of system performance optimization, including latency, cost, and scalability trade-offs
- Experience working in cross-functional teams in fast-paced, product-driven environments
- Strong communication skills and ability to collaborate effectively in remote settings
Responsibilities
- Design, build, and maintain scalable ML infrastructure, including data pipelines, training workflows, and model deployment systems
- Own end-to-end ML lifecycle operations, ensuring reliable delivery of models into production environments at scale
- Develop and optimize CI/CD pipelines for machine learning workflows, enabling rapid and safe iteration
- Implement monitoring, telemetry, and feedback loops for ML models running across large-scale device fleets
- Collaborate with R&D, firmware, backend, and data teams to ensure seamless integration of ML inference systems
- Build tooling, microservices, and frameworks to improve experimentation, data processing, and deployment efficiency
- Optimize compute, storage, and infrastructure costs while maintaining high performance and reliability
- Ensure strong observability and system health across all ML production services
View Full Description & ApplyYou'll be redirected to the employer's site