Staff Machine Learning Engineer, ML Efficiency
New
R
RedditTech-Ads Engineering
You can work remotely from anywhere in the UK or the Netherlands.Full-TimeStaff
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 5+ years
- Required Skills
- PythonFGPA ArchitectureJavaC++GoRustDistributed Systems
Requirements
- BS, MS, or PhD in Computer Science or a related field.
- 5+ years of software engineering experience.
- Strong proficiency in Python.
- Proficiency in at least one systems language (Go, C++, Rust, or Java) preferred.
- Experience building distributed systems at scale.
- Experience with machine learning infrastructure, training systems, or model serving platforms.
- Deep understanding of performance engineering and systems optimization.
- Strong debugging and profiling skills.
Responsibilities
- Design and build systems that improve the efficiency of ML training and inference workloads.
- Develop tooling that helps ML engineers debug, profile, optimize, and monitor model performance.
- Improve GPU and general resource utilization through scheduling, resource management, caching, and workload optimization.
- Partner with ML researchers and product teams to identify bottlenecks and drive performance improvements.
- Build benchmarking frameworks and performance dashboards for training and serving systems.
- Optimize distributed training infrastructure, data pipelines, and model serving architectures.
- Lead cross-functional initiatives that improve the productivity of Reddit ML engineers.
- Drive technical strategy for ML platform scalability, reliability, and cost efficiency.
View Full Description & ApplyYou'll be redirected to the employer's site