Staff Research Engineer, Post-training & Evaluation
R
RedditMachine Learning
This role is completely remote friendly within the United States.Full-TimeStaff
Salary$230,000 — $322,000 USD
Apply NowOpens the employer's application page
Job Details
- Experience
- 6+ years of professional ML experience (or PhD + 4+)
- Required Skills
- PythonMachine LearningPyTorchNLPLLM
Requirements
- 6+ years of professional ML experience (or PhD + 4+) in LLM post-training and evaluation.
- PhD or MS in CS, ML, NLP, IR, or related quantitative field.
- Expertise in evaluation reliability: judge/sample variance, multi-sample scoring, calibration, and statistical significance.
- Experience building custom, domain-specific evaluation harnesses (e.g., lm-eval-harness, Inspect AI).
- Experience evaluating both generation and representation/classification metrics.
- Deep understanding of Continuous Pre-training (CPT) and Instruction Tuning (SFT).
- Fluency in Python.
- Experience with data-pipeline and eval-harness engineering.
- Working knowledge of PyTorch and distributed training (FSDP2, DeepSpeed ZeRO-3).
Responsibilities
- Define the 'Reddit Benchmark' evaluation standard for Safety, Reasoning, and knowledge.
- Establish statistical rigor for evaluation, including judge calibration and multi-sample scoring.
- Design and own model-as-a-judge methodology and prompt calibration.
- Set post-training recipes including SFT data mixtures and curriculum.
- Evaluate base and CPT checkpoints to guide training compute allocation.
- Drive synthetic data generation and curation strategies.
- Partner with Safety Engineering to translate policies into classification metrics and CI/CD tests.
- Diagnose post-training instability, loss curves, and alignment tax.
- Mentor team members and set technical research direction.
View Full Description & ApplyYou'll be redirected to the employer's site