Senior Machine Learning Systems Engineer, Ads ML Experience Platform

New
R
RedditMachine Learning
Remote - United StatesFull-TimeSenior
Salary$216,700 — $303,400 USD
Apply NowOpens the employer's application page

Job Details

Experience
5+ years in infrastructure/platform engineering or large-scale distributed systems; 2+ years of hands-on experience building and operating production ML infrastructure
Required Skills
KubeflowMachine LearningAirflowSparkDistributed Systems

Requirements

  • 5+ years in infrastructure/platform engineering or large-scale distributed systems.
  • 2+ years of hands-on experience building and operating production ML infrastructure, developer SDKs, platform APIs, or self-service AI tooling.
  • Experience building workflow orchestration systems, developer platforms, or large-scale automation frameworks.
  • Experience with distributed data processing systems such as Spark, Flink, Ray, or equivalent technologies.
  • Experience with modern orchestration and workflow technologies such as Kubeflow, Argo, Airflow, or similar frameworks.
  • Experience building offline ML experimentation platforms, model registries, experiment tracking systems, or training orchestration frameworks.
  • Experience building and operating agentic AI systems, including multi-agent orchestration, autonomous workflows, and agent communication/runtime frameworks is a strong plus.
  • Experience running end-to-end model development and iteration cycles at scale is a plus.

Responsibilities

  • Design and build large-scale offline ML experimentation platforms that enable reproducible research, model development, evaluation, and promotion workflows.
  • Develop production-grade training orchestration frameworks supporting distributed training, hyperparameter optimization, model evaluation, and automated retraining.
  • Build infrastructure for experiment tracking, metadata management, lineage, artifact versioning, model registries, and reproducibility.
  • Partner with ML engineers and researchers to improve experimentation velocity and operational efficiency.
  • Build automated workflows for model promotion, rollback, compliance validation, and continuous evaluation.
  • Design and build an agentic AI execution platform supporting autonomous and human-in-the-loop workflows, including multi-agent orchestration, memory/context systems, and scalable workflow infrastructure.
View Full Description & ApplyYou'll be redirected to the employer's site
$216,700 — $303,400 USD
Apply Now