Build and debug on top of modern PyTorch Build clean and intuitive training infrastructure Optimize for high throughput & efficient distributed model training Implement and maintain 3D specific custom operators in Triton or CUDA Implement and maintain novel data-loading framework and libraries Build efficient inference endpoints with complex multi-stage model pipelines Optimize models through compilation, fusion, quantization, etc.