Staff Research Engineer, Post-training & Evaluation
This role is completely remote friendly within the United States.Full-TimeStaff
Salary$230,000 — $322,000 USD
Apply NowOpens the employer's application page
Job Details
- Experience
- 6+ years of professional ML experience (or PhD + 4+)
- Required Skills
- PythonMachine LearningPyTorchNLPLLM
Requirements
- 6+ years of professional ML experience or PhD + 4 years.
- PhD or MS in CS, ML, NLP, IR, or related field.
- Deep expertise in evaluation reliability, calibration, and statistical significance.
- Experience building custom domain-specific evaluation harnesses.
- Experience evaluating both generation and representation/classification metrics.
- Deep understanding of CPT and SFT data quality.
- Fluency in Python.
- Working knowledge of PyTorch and distributed training (FSDP2, DeepSpeed).
- Experience with eval-harness tools like Hugging Face Transformers, vLLM, or lm-eval-harness.
Responsibilities
- Define the Reddit Benchmark evaluation standard for model quality.
- Own evaluation reliability, including statistical significance and judge calibration.
- Design model-as-a-judge methodology for automated evaluation.
- Set post-training recipes and strategy (SFT data mixtures, curriculum).
- Design checkpoint-selection methodology for base and CPT models.
- Drive synthetic data generation strategy for instruction and evaluation.
- Partner with Safety Engineering on classification metrics and probe sets.
- Diagnose post-training instability via loss curves and logs.
- Lead research direction and mentor team members.
View Full Description & ApplyYou'll be redirected to the employer's site