Human Data Evals Lead

New
A
Anyone AIArtificial Intelligence
Remote / Latam / USTemporaryLead
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Languages
English
Experience
5+ years

Requirements

  • 5+ years in technical delivery, quality, or program management.
  • Recent experience in AI/ML data, model evaluation, or benchmarking.
  • Hands-on experience delivering evaluation work to AI labs or enterprise ML teams.
  • Deep expertise in LLM benchmarking, particularly code-model evaluation.
  • Proven ability to translate eval targets into sample tasks that demonstrate model capability.
  • Experience building QC processes and artifact standards for enterprise or lab requirements.
  • Proven people or vendor leadership experience in recruiting and calibrating expert pools.
  • Working fluency with evaluation metrics including benchmarks, rubrics, pass rates, and headroom.
  • Fluent English language proficiency.

Responsibilities

  • Study public benchmarks and eval targets to create proposals and sample packages that win work.
  • Design and build sample packages in collaboration with subject-matter experts, ensuring high-quality reasoning trajectories and gold-standard truth.
  • Develop rigorous QC structures including calibration layers, rubrics, and deterministic verifiers.
  • Recruit, brief, and calibrate a pool of subject-matter experts across coding and STEM domains.
  • Act as a direct point of contact for AI lab partners, managing expectations and feedback loops.
  • Own pilots end-to-end including scoping, SOW, staffing, production, QC, and delivery.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now