ApplyMember of Technical Staff - Pretraining / Inference Optimization
Posted 4 months agoViewed
View full description
Requirements:
- Familiarity with effective techniques in optimizing inference and training workloads.
- Knowledge in optimizing for both memory-bound and compute-bound operations.
- Understanding of GPU memory hierarchy and computation capabilities.
- Deep understanding of efficient attention algorithms.
- Experience implementing forward and backward Triton kernels with a focus on correctness and floating-point errors.
- Ability to integrate custom-written kernels into a PyTorch framework using tools like pybind.
Responsibilities:
- Finding ideal training strategies for various model sizes and compute loads.
- Profiling, debugging, and optimizing single and multi-GPU operations using tools such as Nsight.
- Reasoning about speed and quality trade-offs of quantization for model inference.
- Developing and improving low-level kernel optimizations for state-of-the-art inference and training.
- Innovating new ideas to maximize GPU performance.
Apply