Human Data Evals Lead

New
Based in United States, flexibility across LATAM and US time zonesFull-TimeLead
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Languages
Excellent English communication skills; Spanish is a plus.
Experience
5+ years of experience
Required Skills
Machine Learning

Requirements

  • 5+ years of experience in technical program management, data operations, quality engineering, or ML evaluation roles.
  • Proven experience working with AI labs or enterprise ML teams, delivering datasets, benchmarks, or evaluation frameworks.
  • Strong understanding of LLM evaluation concepts such as benchmarks, rubrics, pass rates, headroom, and model discrimination.
  • Hands-on experience designing or managing QC processes and ensuring high-quality annotated or evaluated datasets.
  • Demonstrated ability to recruit, manage, and calibrate subject-matter experts or external contributor pools.
  • Strong problem-solving skills in ambiguous environments with evolving requirements and fast iteration cycles.
  • Excellent English communication skills; Spanish is a plus.

Responsibilities

  • Own the design, development, and delivery of high-quality AI evaluation data initiatives, from initial proposals through pilot execution and production readiness.
  • Develop data proposals and sample packages based on lab requests, benchmarks, and evaluation targets, translating them into structured, high-signal datasets.
  • Design frontier-grade evaluation samples across reasoning, coding, agents, tool use, and multimodal tasks, ensuring measurable model discrimination and headroom.
  • Define and enforce rigorous quality control frameworks, including expert verification, calibration layers, rubrics, and deterministic validation approaches.
  • Recruit, onboard, and manage subject-matter experts across technical domains, ensuring consistent output quality aligned with benchmark standards.
  • Own pilot engagements end-to-end, including scoping, staffing, SOW definition, QC execution, and final delivery to AI lab partners.
  • Act as a key point of contact for lab stakeholders, aligning expectations and surfacing technical requirements in collaboration with internal leadership.
  • Continuously refine evaluation methodologies and sample design standards to improve signal quality and benchmark reliability.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now