Senior Software Engineer — AI Evaluation & Benchmarks
New
G
G2i Inc.Artificial Intelligence
Fully remote — work from anywhere on the accepted locations listContractSenior
Salary80 - 100 USD per hour
Apply NowOpens the employer's application page
Job Details
- Languages
- English
- Experience
- 4+ years
- Required Skills
- PythonGitMachine LearningSoftware EngineeringLLM
Requirements
- 4+ years of professional software engineering experience
- Expert Python programming skills
- Experience working in large, complex codebases
- Proven experience designing and implementing LLM coding benchmarks
- Experience with evaluation data pipelines
- Strong command of Git and modern development workflows
- Track record at a high-growth tech company or top-tier software organization
- Strong written English communication
Responsibilities
- Design coding benchmarks that evaluate frontier models on real-world programming tasks
- Build and maintain scalable data pipelines for evaluation workflows
- Analyze model-generated code for correctness, reliability, and edge-case failures
- Construct structured evaluation scenarios across large repos and multi-language environments
- Provide detailed technical feedback on model performance and failure patterns
- Contribute to evaluation frameworks that set the bar for how coding ability is measured
View Full Description & ApplyYou'll be redirected to the employer's site