Senior / Staff ML Training Optimization Engineer

Remote US & Canada / Dallas, TX / Phoenix, AZ / Pittsburgh, PA / San Francisco, CA / Toronto, ONFull-TimeStaff
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Required Skills
PythonKubernetesPyTorchC++Rust

Requirements

  • MS/PhD or Bachelors degree with a minimum of 4 years of industry experience in Computer Science, Robotics and/or similar technical field(s) of study.
  • Solid coding proficiency in a variety of coding languages including Python, C++ or Rust.
  • Experience in deep learning frameworks such as PyTorch or Jax.
  • Skilled in profiling CPU and GPU code using tools such as PyTorch Profiler and NVIDIA Nsight.

Responsibilities

  • Build standardized distributed training frameworks for research and production, drive our training towards new levels of stability and efficiency.
  • Comprehensively profile model runtime and memory to pinpoint performance bottlenecks.
  • Identify and evaluate emerging technologies that can be adopted into Waabi’s training and inference frameworks. Examples include designing new CUDA kernels, quantization-aware training and inference, and compilation/deployment techniques.
  • Work with researchers and ML engineers on best-practices for optimal resource usage.
  • Create and improve tooling and dashboards to ensure broad adoption of your work.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now