Senior Software Engineer — AI Evaluation & Benchmarks

G2i Inc.Artificial Intelligence

Fully remote — work from anywhere on the accepted locations listContractSenior

Salary80 - 100 USD per hour

Apply NowOpens the employer's application page

Job Details

Design coding benchmarks that evaluate frontier models on real-world programming tasks
Build and maintain scalable data pipelines for evaluation workflows
Analyze model-generated code for correctness, reliability, and edge-case failures
Construct structured evaluation scenarios across large repos and multi-language environments
Provide detailed technical feedback on model performance and failure patterns
Contribute to evaluation frameworks that set the bar for how coding ability is measured

View Full Description & ApplyYou'll be redirected to the employer's site