Senior / Staff ML Training Optimization Engineer
Remote US & Canada / Dallas, TX / Phoenix, AZ / Pittsburgh, PA / San Francisco, CA / Toronto, ONFull-TimeStaff
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Required Skills
- PythonKubernetesPyTorchC++Rust
Requirements
- MS/PhD or Bachelors degree with a minimum of 4 years of industry experience in Computer Science, Robotics and/or similar technical field(s) of study.
- Solid coding proficiency in a variety of coding languages including Python, C++ or Rust.
- Experience in deep learning frameworks such as PyTorch or Jax.
- Skilled in profiling CPU and GPU code using tools such as PyTorch Profiler and NVIDIA Nsight.
Responsibilities
- Build standardized distributed training frameworks for research and production, drive our training towards new levels of stability and efficiency.
- Comprehensively profile model runtime and memory to pinpoint performance bottlenecks.
- Identify and evaluate emerging technologies that can be adopted into Waabi’s training and inference frameworks. Examples include designing new CUDA kernels, quantization-aware training and inference, and compilation/deployment techniques.
- Work with researchers and ML engineers on best-practices for optimal resource usage.
- Create and improve tooling and dashboards to ensure broad adoption of your work.
View Full Description & ApplyYou'll be redirected to the employer's site