Senior Machine Learning Systems Engineer, Ads ML Experience Platform

New
You can apply to work remotely in any country in which we have a physical presenceFull-TimeSenior
Salary$216,700 — $303,400 USD
Apply NowOpens the employer's application page

Job Details

Experience
5+ years in infrastructure/platform engineering or large-scale distributed systems; 2+ years of hands-on experience building and operating production ML infrastructure.
Required Skills
KubeflowAirflowSparkDistributed Systems

Requirements

  • 5+ years in infrastructure/platform engineering or large-scale distributed systems.
  • 2+ years of hands-on experience building and operating production ML infrastructure, developer SDKs, platform APIs, or self-service AI tooling.
  • Experience building workflow orchestration systems, developer platforms, or large-scale automation frameworks.
  • Experience with distributed data processing systems such as Spark, Flink, Ray, or equivalent technologies.
  • Experience with modern orchestration and workflow technologies such as Kubeflow, Argo, Airflow, or similar frameworks.
  • Experience building offline ML experimentation platforms, model registries, experiment tracking systems, or training orchestration frameworks.

Responsibilities

  • Design and build large-scale offline ML experimentation platforms that enable reproducible research, model development, evaluation, and promotion workflows.
  • Develop production-grade training orchestration frameworks supporting distributed training, hyperparameter optimization, model evaluation, and automated retraining.
  • Build infrastructure for experiment tracking, metadata management, lineage, artifact versioning, model registries, and reproducibility.
  • Partner with ML engineers and researchers to improve experimentation velocity and operational efficiency.
  • Build automated workflows for model promotion, rollback, compliance validation, and continuous evaluation.
  • Design and build an agentic AI execution platform supporting autonomous and human-in-the-loop workflows, including multi-agent orchestration, memory/context systems, and scalable workflow infrastructure.
View Full Description & ApplyYou'll be redirected to the employer's site
$216,700 — $303,400 USD
Apply Now