AI Benchmark & Datasets Engineer / Researcher

New

Candidates based anywhere in the EU, UK, United States, and Canada will be considered.Full-TimeMiddle

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Published at least one paper at NeurIPS, ICLR, or ICML as lead author or with significant conceptual/code contributions.
Significantly contributed to a newsworthy LLM training effort.
6 months of experience in a leading Machine Learning research center (e.g., Google Brain, Deepmind, Apple, Meta, Anthropic, Nvidia, MILA).
ICPC World Finalist or IOI, IMO, or IPhO medalist.
Experience with ML/LLM evaluation, data science, or technical product roles.
Ability to read papers, leaderboards, and Github repos to create clear, repeatable benchmark specs.
Fluent in English.

Proactively identify, prioritize, and curate relevant public and client-driven benchmarks across our target use cases and markets.
Evaluate candidate benchmarks for clarity, data quality, evaluation methodology, and fit with our model roadmap.
Run benchmarks with baseline models to validate setup, uncover edge cases, and de‑risk R&D runs.
Hand off “benchmark-ready” packages to R&D (specs, data, evaluation scripts, expected metrics, constraints).
Maintain a shared vocabulary and documentation around benchmarks, datasets, and evaluation formats that GTM and R&D can both use.
Track and organize benchmark results, model leaderboards, and “what good looks like” for different customers and scenarios.
Contribute to demos and public‑facing proof points based on benchmark outcomes.

View Full Description & ApplyYou'll be redirected to the employer's site