Principal AI Research Scientist Post-Training Alignment
New
Remote options across CanadaFull-TimePrincipal
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Required Skills
- Machine Learning
Requirements
- Deep expertise in reinforcement learning and post-training methodologies (RLHF, RLAIF, DPO, PPO).
- PhD or equivalent industry research experience in machine learning or AI.
- Proven track record in leading or mentoring research teams.
- Strong publication history in top-tier ML/AI venues.
- Experience in alignment research, preference learning, or agentic AI.
- Strong intuition for model behavior and failure modes.
- Experience designing evaluation systems for deployment.
- Familiarity with large-scale training infrastructure.
- Ability to communicate complex technical concepts.
- Experience working with or deploying production AI systems.
Responsibilities
- Lead research and development in post-training methods for foundation models, including reinforcement learning, preference optimization, and alignment techniques such as RLHF, RLAIF, DPO, and PPO.
- Design and develop novel algorithms that improve model reliability, controllability, reasoning ability, and alignment with human and system objectives.
- Define and execute experimental frameworks to evaluate model behavior, robustness, safety, and long-horizon reasoning performance.
- Architect evaluation systems for agentic workflows, tool use, and real-world task completion.
- Make principled decisions on model improvements.
- Lead model analysis and interpretability efforts.
- Collaborate with infrastructure teams to build scalable post-training pipelines.
- Establish model readiness criteria for deployment.
- Contribute to scientific publications and patents.
- Communicate technical risks and strategic trade-offs.
View Full Description & ApplyYou'll be redirected to the employer's site