Staff Machine Learning Engineer, ML Efficiency

New
You can work remotely from anywhere in the UK or the Netherlands.Full-TimeStaff
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
5+ years of software engineering experience.
Required Skills
PythonDebuggingDistributed Systems

Requirements

  • BS, MS, or PhD in Computer Science or a related field.
  • 5+ years of software engineering experience.
  • Strong proficiency in Python.
  • Experience building distributed systems at scale.
  • Experience with machine learning infrastructure, training systems, or model serving platforms.
  • Deep understanding of performance engineering and systems optimization.
  • Strong debugging and profiling skills.

Responsibilities

  • Design and build systems that improve the efficiency of ML training and inference workloads.
  • Develop tooling that helps ML engineers debug, profile, optimize, and monitor model performance.
  • Improve GPU and general resource utilization through scheduling, resource management, caching, and workload optimization.
  • Partner with ML researchers and product teams to identify bottlenecks and drive performance improvements.
  • Build benchmarking frameworks and performance dashboards for training and serving systems.
  • Optimize distributed training infrastructure, data pipelines, and model serving architectures.
  • Lead cross-functional initiatives that improve the productivity of Reddit ML engineers.
  • Drive technical strategy for ML platform scalability, reliability, and cost efficiency.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now