Human Data Evals Lead

New

Based in United States, flexibility across LATAM and US time zonesFull-TimeLead

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

5+ years of experience in technical program management, data operations, quality engineering, or ML evaluation roles.
Proven experience working with AI labs or enterprise ML teams, delivering datasets, benchmarks, or evaluation frameworks.
Strong understanding of LLM evaluation concepts such as benchmarks, rubrics, pass rates, headroom, and model discrimination.
Hands-on experience designing or managing QC processes and ensuring high-quality annotated or evaluated datasets.
Demonstrated ability to recruit, manage, and calibrate subject-matter experts or external contributor pools.
Strong problem-solving skills in ambiguous environments with evolving requirements and fast iteration cycles.
Excellent English communication skills; Spanish is a plus.

Own the design, development, and delivery of high-quality AI evaluation data initiatives, from initial proposals through pilot execution and production readiness.
Develop data proposals and sample packages based on lab requests, benchmarks, and evaluation targets, translating them into structured, high-signal datasets.
Design frontier-grade evaluation samples across reasoning, coding, agents, tool use, and multimodal tasks, ensuring measurable model discrimination and headroom.
Define and enforce rigorous quality control frameworks, including expert verification, calibration layers, rubrics, and deterministic validation approaches.
Recruit, onboard, and manage subject-matter experts across technical domains, ensuring consistent output quality aligned with benchmark standards.
Own pilot engagements end-to-end, including scoping, staffing, SOW definition, QC execution, and final delivery to AI lab partners.
Act as a key point of contact for lab stakeholders, aligning expectations and surfacing technical requirements in collaboration with internal leadership.
Continuously refine evaluation methodologies and sample design standards to improve signal quality and benchmark reliability.

View Full Description & ApplyYou'll be redirected to the employer's site