Researcher, Evaluations
New
Fully remote environment, flexible work hoursFull-TimeMiddle
Salary115,000 - 200,000 USD per year
Apply NowOpens the employer's application page
Job Details
- Required Skills
- PythonData AnalysisResearch
Requirements
- Analytical thinking with a rigorous approach to experimentation.
- Grounded, skeptical mindset regarding AI system capabilities versus hype.
- Practical experience using AI agents and tools.
- Familiarity with AI benchmarks and evaluation methodologies.
- Research and data-analysis experience.
- Comfortable with light coding to perform data analysis.
- Strong written communication skills for conveying nuanced observations.
- Python proficiency is a plus.
- Experience testing frontier models is a plus.
Responsibilities
- Create and curate an evaluation suite of challenging real-world tasks.
- Devise and refine grading rubrics for assessing AI performance.
- Evaluate new frontier AI models and products against the task suite.
- Analyze evaluation results and conduct model comparisons.
- Communicate research findings via public-facing reports, blog posts, and data visualizations.
- Automate parts of the workflow and develop standalone benchmarks.
View Full Description & ApplyYou'll be redirected to the employer's site