Senior Machine Learning Systems Engineer, Ads ML Experience Platform
New
Based in the United StatesFull-TimeSenior
SalaryCompetitive base salary with additional equity (RSUs) and potential bonus eligibility
Apply NowOpens the employer's application page
Job Details
- Experience
- 5+ years in platform engineering, distributed systems, or large-scale infrastructure development; 2+ years building production ML infrastructure
- Required Skills
- PythonCloud ComputingKubeflowAirflowSparkDistributed Systems
Requirements
- 5+ years of experience in platform engineering, distributed systems, or large-scale infrastructure development.
- 2+ years of experience building production ML infrastructure, developer platforms, or AI tooling.
- Strong expertise in ML workflow orchestration and distributed data processing frameworks like Spark, Ray, or Flink.
- Hands-on experience with orchestration tools such as Airflow, Kubeflow, or Argo.
- Proven ability to build and maintain ML experimentation platforms, model registries, or training pipelines.
- Strong programming skills in Python and familiarity with scalable software engineering practices.
- Experience with cloud-based ML systems and production deployment environments.
- Excellent communication skills for translating technical complexity into clear insights.
Responsibilities
- Lead the design and development of scalable ML infrastructure for experimentation, training, and deployment workflows.
- Build and evolve large-scale offline ML experimentation platforms for reproducibility and model promotion.
- Develop distributed training orchestration systems to support hyperparameter tuning and evaluation pipelines.
- Design infrastructure for experiment tracking, metadata management, lineage, artifact versioning, and model registries.
- Create automated workflows for model promotion, rollback, compliance validation, and continuous monitoring.
- Collaborate with ML engineers and researchers to improve experimentation velocity and platform efficiency.
- Contribute to the design of agentic AI systems enabling multi-agent orchestration.
View Full Description & ApplyYou'll be redirected to the employer's site