Sr. Principal Software Scientist
New
Based in the United StatesFull-TimePrincipal
Salary$185,000 to $280,000 USD
Apply NowOpens the employer's application page
Job Details
- Required Skills
- Deep LearningGenerative AI
Requirements
- Extensive hands-on experience in deep learning with strong theoretical grounding in transformer architectures.
- Proven track record of training large-scale foundation models from scratch.
- Deep understanding of distributed training systems and associated scaling challenges.
- Strong ability to reason about optimization behavior, debug instability, and improve convergence at scale.
- Deep expertise in transformer internals, attention mechanisms, and architectural techniques such as GQA, RoPE, ALiBi, and MoE.
- Strong understanding of scaling laws, compute/data trade-offs, and model efficiency.
- Experience with distributed training frameworks including FSDP, ZeRO, tensor parallelism, and pipeline parallelism.
- Proficiency with mixed precision techniques such as bf16 and fp8.
- Ability to operate effectively in ambiguous, research-heavy environments.
Responsibilities
- Lead the design and development of large-scale transformer and hybrid foundation models, defining architecture choices across text, multimodal, and emerging generative AI paradigms.
- Build and train large models from first principles, focusing on architecture innovation to ensure scalability and robustness at production scale.
- Diagnose and resolve training instability issues, including divergence, optimizer failures, and gradient pathologies.
- Define and evaluate scaling strategies across compute, data, and model size to optimize performance and efficiency.
- Design and experiment with loss functions and alignment strategies such as RLHF, DPO, and GRPO.
- Architect distributed training systems using FSDP, ZeRO-3, tensor/pipeline parallelism, and mixed precision techniques.
View Full Description & ApplyYou'll be redirected to the employer's site