Solid software engineering background with several years of professional experience Experience building and maintaining complex systems Ability to generate ideas for new benchmarks and experiments Motivation by Epoch AI's mission Hands-on experience running LLM evaluations (a plus) Familiarity with evaluation frameworks like Inspect (a plus) Solid grasp of current AI trends (a plus)
Responsibilities:
Implement AI benchmarks within evaluation infrastructure Develop existing suite of benchmarks for new model releases Contribute to the development of brand new benchmarks Collaborate with researchers, analysts, and engineers