Senior Software Engineer — AI Evaluation & Benchmarks

New
G
G2i Inc.Artificial Intelligence
Fully remote — work from anywhere on the accepted locations listContractSenior
Salary80 - 100 USD per hour
Apply NowOpens the employer's application page

Job Details

Languages
English
Experience
4+ years
Required Skills
PythonGitMachine LearningSoftware EngineeringLLM

Requirements

  • 4+ years of professional software engineering experience
  • Expert Python programming skills
  • Experience working in large, complex codebases
  • Proven experience designing and implementing LLM coding benchmarks
  • Experience with evaluation data pipelines
  • Strong command of Git and modern development workflows
  • Track record at a high-growth tech company or top-tier software organization
  • Strong written English communication

Responsibilities

  • Design coding benchmarks that evaluate frontier models on real-world programming tasks
  • Build and maintain scalable data pipelines for evaluation workflows
  • Analyze model-generated code for correctness, reliability, and edge-case failures
  • Construct structured evaluation scenarios across large repos and multi-language environments
  • Provide detailed technical feedback on model performance and failure patterns
  • Contribute to evaluation frameworks that set the bar for how coding ability is measured
View Full Description & ApplyYou'll be redirected to the employer's site
80 - 100 USD per hour
Apply Now