Finding ideal training strategies for various model sizes and compute loads. Profiling, debugging, and optimizing single and multi-GPU operations using tools such as Nsight. Reasoning about speed and quality trade-offs of quantization for model inference. Developing and improving low-level kernel optimizations for state-of-the-art inference and training. Innovating new ideas to maximize GPU performance.