Researcher, Evaluations

New
Fully remote environment, flexible work hoursFull-TimeMiddle
Salary115,000 - 200,000 USD per year
Apply NowOpens the employer's application page

Job Details

Required Skills
PythonData AnalysisResearch

Requirements

  • Analytical thinking with a rigorous approach to experimentation.
  • Grounded, skeptical mindset regarding AI system capabilities versus hype.
  • Practical experience using AI agents and tools.
  • Familiarity with AI benchmarks and evaluation methodologies.
  • Research and data-analysis experience.
  • Comfortable with light coding to perform data analysis.
  • Strong written communication skills for conveying nuanced observations.
  • Python proficiency is a plus.
  • Experience testing frontier models is a plus.

Responsibilities

  • Create and curate an evaluation suite of challenging real-world tasks.
  • Devise and refine grading rubrics for assessing AI performance.
  • Evaluate new frontier AI models and products against the task suite.
  • Analyze evaluation results and conduct model comparisons.
  • Communicate research findings via public-facing reports, blog posts, and data visualizations.
  • Automate parts of the workflow and develop standalone benchmarks.
View Full Description & ApplyYou'll be redirected to the employer's site
115,000 - 200,000 USD per year
Apply Now