Senior Software Engineer — AI Evaluation & Benchmarks
New
Albania, Austria, Belgium, Bosnia and Herzegovina, Brazil, Bulgaria, Canada, Chile, Colombia, Czechia, Dominican Republic, Ecuador, Estonia, France, Germany, Greece, Hungary, Ireland, Italy, Latvia, Lithuania, Malta, Mexico, Montenegro, Netherlands, North Macedonia, Paraguay, Peru, Poland, Portugal, Puerto Rico, Romania, Serbia, Slovakia, Spain, Turkey, United Kingdom, United States, Uruguay, VenezuelaContractSenior
Salary166,400 - 208,000 USD per year
Apply NowOpens the employer's application page
Job Details
- Languages
- English
- Experience
- 4+ years
- Required Skills
- PythonGitMachine Learning
Requirements
- 4+ years of professional software engineering experience
- Expert Python proficiency
- Hands-on experience working in large, complex codebases
- Proven experience designing and implementing LLM coding benchmarks and evaluation data pipelines
- Strong command of Git and modern development workflows
- Track record at a high-growth tech company or top-tier software organization
- Strong written English communication
Responsibilities
- Design coding benchmarks that evaluate frontier models on real-world programming tasks
- Build and maintain scalable data pipelines for evaluation workflows
- Analyze model-generated code for correctness, reliability, and edge-case failures
- Construct structured evaluation scenarios across large repos and multi-language environments
- Provide detailed technical feedback on model performance and failure patterns
- Contribute to evaluation frameworks that set the bar for how coding ability is measured
View Full Description & ApplyYou'll be redirected to the employer's site