Strong background in deep learning and transformer architectures Hands-on experience training or fine-tuning large models (LLMs or vision models) Proficiency with PyTorch, JAX, or TensorFlow Experience with distributed training frameworks (DeepSpeed, FSDP, Megatron, ZeRO, Ray) Strong software engineering skills — writing robust, production-grade systems Experience with GPU optimization: memory efficiency, quantization, mixed precision Comfortable owning ambiguous, zero-to-one technical problems end-to-end