AI Benchmark & Datasets Engineer / Researcher
New
Candidates based anywhere in the EU, UK, United States, and Canada will be considered.Full-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Languages
- English
- Required Skills
- Machine LearningData scienceResearchLLM
Requirements
- Published at least one paper at NeurIPS, ICLR, or ICML as lead author or with significant conceptual/code contributions.
- Significantly contributed to a newsworthy LLM training effort.
- 6 months of experience in a leading Machine Learning research center (e.g., Google Brain, Deepmind, Apple, Meta, Anthropic, Nvidia, MILA).
- ICPC World Finalist or IOI, IMO, or IPhO medalist.
- Experience with ML/LLM evaluation, data science, or technical product roles.
- Ability to read papers, leaderboards, and Github repos to create clear, repeatable benchmark specs.
- Fluent in English.
Responsibilities
- Proactively identify, prioritize, and curate relevant public and client-driven benchmarks across our target use cases and markets.
- Evaluate candidate benchmarks for clarity, data quality, evaluation methodology, and fit with our model roadmap.
- Run benchmarks with baseline models to validate setup, uncover edge cases, and de‑risk R&D runs.
- Hand off “benchmark-ready” packages to R&D (specs, data, evaluation scripts, expected metrics, constraints).
- Maintain a shared vocabulary and documentation around benchmarks, datasets, and evaluation formats that GTM and R&D can both use.
- Track and organize benchmark results, model leaderboards, and “what good looks like” for different customers and scenarios.
- Contribute to demos and public‑facing proof points based on benchmark outcomes.
View Full Description & ApplyYou'll be redirected to the employer's site