AI Benchmark & Datasets Engineer / Researcher

New
Candidates based anywhere in the EU, UK, United States, and Canada will be considered.Full-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Languages
English
Required Skills
Machine LearningData scienceResearchLLM

Requirements

  • Published at least one paper at NeurIPS, ICLR, or ICML as lead author or with significant conceptual/code contributions.
  • Significantly contributed to a newsworthy LLM training effort.
  • 6 months of experience in a leading Machine Learning research center (e.g., Google Brain, Deepmind, Apple, Meta, Anthropic, Nvidia, MILA).
  • ICPC World Finalist or IOI, IMO, or IPhO medalist.
  • Experience with ML/LLM evaluation, data science, or technical product roles.
  • Ability to read papers, leaderboards, and Github repos to create clear, repeatable benchmark specs.
  • Fluent in English.

Responsibilities

  • Proactively identify, prioritize, and curate relevant public and client-driven benchmarks across our target use cases and markets.
  • Evaluate candidate benchmarks for clarity, data quality, evaluation methodology, and fit with our model roadmap.
  • Run benchmarks with baseline models to validate setup, uncover edge cases, and de‑risk R&D runs.
  • Hand off “benchmark-ready” packages to R&D (specs, data, evaluation scripts, expected metrics, constraints).
  • Maintain a shared vocabulary and documentation around benchmarks, datasets, and evaluation formats that GTM and R&D can both use.
  • Track and organize benchmark results, model leaderboards, and “what good looks like” for different customers and scenarios.
  • Contribute to demos and public‑facing proof points based on benchmark outcomes.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now