Staff Research Engineer, Post-training & Evaluation

This role is completely remote friendly within the United States.Full-TimeStaff
Salary$230,000 — $322,000 USD
Apply NowOpens the employer's application page

Job Details

Experience
6+ years of professional ML experience (or PhD + 4+)
Required Skills
PythonMachine LearningPyTorchNLPLLM

Requirements

  • 6+ years of professional ML experience or PhD + 4 years.
  • PhD or MS in CS, ML, NLP, IR, or related field.
  • Deep expertise in evaluation reliability, calibration, and statistical significance.
  • Experience building custom domain-specific evaluation harnesses.
  • Experience evaluating both generation and representation/classification metrics.
  • Deep understanding of CPT and SFT data quality.
  • Fluency in Python.
  • Working knowledge of PyTorch and distributed training (FSDP2, DeepSpeed).
  • Experience with eval-harness tools like Hugging Face Transformers, vLLM, or lm-eval-harness.

Responsibilities

  • Define the Reddit Benchmark evaluation standard for model quality.
  • Own evaluation reliability, including statistical significance and judge calibration.
  • Design model-as-a-judge methodology for automated evaluation.
  • Set post-training recipes and strategy (SFT data mixtures, curriculum).
  • Design checkpoint-selection methodology for base and CPT models.
  • Drive synthetic data generation strategy for instruction and evaluation.
  • Partner with Safety Engineering on classification metrics and probe sets.
  • Diagnose post-training instability via loss curves and logs.
  • Lead research direction and mentor team members.
View Full Description & ApplyYou'll be redirected to the employer's site
$230,000 — $322,000 USD
Apply Now