Principal AI Research Scientist Post-Training Alignment

New

Remote options across CanadaFull-TimePrincipal

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Deep expertise in reinforcement learning and post-training methodologies (RLHF, RLAIF, DPO, PPO).
PhD or equivalent industry research experience in machine learning or AI.
Proven track record in leading or mentoring research teams.
Strong publication history in top-tier ML/AI venues.
Experience in alignment research, preference learning, or agentic AI.
Strong intuition for model behavior and failure modes.
Experience designing evaluation systems for deployment.
Familiarity with large-scale training infrastructure.
Ability to communicate complex technical concepts.
Experience working with or deploying production AI systems.

Lead research and development in post-training methods for foundation models, including reinforcement learning, preference optimization, and alignment techniques such as RLHF, RLAIF, DPO, and PPO.
Design and develop novel algorithms that improve model reliability, controllability, reasoning ability, and alignment with human and system objectives.
Define and execute experimental frameworks to evaluate model behavior, robustness, safety, and long-horizon reasoning performance.
Architect evaluation systems for agentic workflows, tool use, and real-world task completion.
Make principled decisions on model improvements.
Lead model analysis and interpretability efforts.
Collaborate with infrastructure teams to build scalable post-training pipelines.
Establish model readiness criteria for deployment.
Contribute to scientific publications and patents.
Communicate technical risks and strategic trade-offs.

View Full Description & ApplyYou'll be redirected to the employer's site