Senior Software Engineer — AI Evaluation & Benchmarks
New
United StatesContractSenior
Salary80 - 100 USD per hour
Apply NowOpens the employer's application page
Job Details
- Languages
- English
- Experience
- 4+ years
- Required Skills
- PythonGitMachine LearningCI/CDSoftware EngineeringLLMUnit Testing
Requirements
- 4+ years of professional software engineering experience in high-quality production environments.
- Expert-level Python development skills.
- Hands-on experience working within large, complex, and production-grade codebases.
- Proven experience building or contributing to LLM evaluation systems, coding benchmarks, or AI model testing pipelines.
- Strong understanding of Git workflows, software engineering best practices, and modern development processes.
- Experience working in high-growth technology companies or top-tier engineering organizations.
- Excellent analytical and problem-solving skills.
- Strong written communication skills in English.
- Experience with CI/CD systems and unit testing frameworks.
- Familiarity with additional programming languages such as JavaScript, Go, or C++ is a plus.
- Background in ML evaluation methodologies, open-source contributions, or security engineering is an advantage.
Responsibilities
- Design and build coding benchmarks that evaluate frontier AI models on real-world software engineering tasks.
- Develop and maintain scalable evaluation pipelines and data infrastructure to support large-scale model testing workflows.
- Analyze AI-generated code for correctness, robustness, performance issues, and edge-case failures.
- Construct structured evaluation environments across large repositories and multi-language codebases.
- Provide detailed technical feedback on model behavior, failure modes, and performance patterns.
- Contribute to the design and evolution of evaluation methodologies.
- Collaborate with research and engineering stakeholders to refine benchmarks.
- Ensure evaluation systems are reliable, reproducible, and optimized for scale and accuracy.
View Full Description & ApplyYou'll be redirected to the employer's site