Human Data Evals Lead
New
Based in United States, flexibility across LATAM and US time zonesFull-TimeLead
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Languages
- Excellent English communication skills; Spanish is a plus.
- Experience
- 5+ years of experience
- Required Skills
- Machine Learning
Requirements
- 5+ years of experience in technical program management, data operations, quality engineering, or ML evaluation roles.
- Proven experience working with AI labs or enterprise ML teams, delivering datasets, benchmarks, or evaluation frameworks.
- Strong understanding of LLM evaluation concepts such as benchmarks, rubrics, pass rates, headroom, and model discrimination.
- Hands-on experience designing or managing QC processes and ensuring high-quality annotated or evaluated datasets.
- Demonstrated ability to recruit, manage, and calibrate subject-matter experts or external contributor pools.
- Strong problem-solving skills in ambiguous environments with evolving requirements and fast iteration cycles.
- Excellent English communication skills; Spanish is a plus.
Responsibilities
- Own the design, development, and delivery of high-quality AI evaluation data initiatives, from initial proposals through pilot execution and production readiness.
- Develop data proposals and sample packages based on lab requests, benchmarks, and evaluation targets, translating them into structured, high-signal datasets.
- Design frontier-grade evaluation samples across reasoning, coding, agents, tool use, and multimodal tasks, ensuring measurable model discrimination and headroom.
- Define and enforce rigorous quality control frameworks, including expert verification, calibration layers, rubrics, and deterministic validation approaches.
- Recruit, onboard, and manage subject-matter experts across technical domains, ensuring consistent output quality aligned with benchmark standards.
- Own pilot engagements end-to-end, including scoping, staffing, SOW definition, QC execution, and final delivery to AI lab partners.
- Act as a key point of contact for lab stakeholders, aligning expectations and surfacing technical requirements in collaboration with internal leadership.
- Continuously refine evaluation methodologies and sample design standards to improve signal quality and benchmark reliability.
View Full Description & ApplyYou'll be redirected to the employer's site