Senior Software Engineer — AI Evaluation & Benchmarks

New
United StatesContractSenior
Salary80 - 100 USD per hour
Apply NowOpens the employer's application page

Job Details

Languages
English
Experience
4+ years
Required Skills
PythonGitMachine LearningCI/CDSoftware EngineeringLLMUnit Testing

Requirements

  • 4+ years of professional software engineering experience in high-quality production environments.
  • Expert-level Python development skills.
  • Hands-on experience working within large, complex, and production-grade codebases.
  • Proven experience building or contributing to LLM evaluation systems, coding benchmarks, or AI model testing pipelines.
  • Strong understanding of Git workflows, software engineering best practices, and modern development processes.
  • Experience working in high-growth technology companies or top-tier engineering organizations.
  • Excellent analytical and problem-solving skills.
  • Strong written communication skills in English.
  • Experience with CI/CD systems and unit testing frameworks.
  • Familiarity with additional programming languages such as JavaScript, Go, or C++ is a plus.
  • Background in ML evaluation methodologies, open-source contributions, or security engineering is an advantage.

Responsibilities

  • Design and build coding benchmarks that evaluate frontier AI models on real-world software engineering tasks.
  • Develop and maintain scalable evaluation pipelines and data infrastructure to support large-scale model testing workflows.
  • Analyze AI-generated code for correctness, robustness, performance issues, and edge-case failures.
  • Construct structured evaluation environments across large repositories and multi-language codebases.
  • Provide detailed technical feedback on model behavior, failure modes, and performance patterns.
  • Contribute to the design and evolution of evaluation methodologies.
  • Collaborate with research and engineering stakeholders to refine benchmarks.
  • Ensure evaluation systems are reliable, reproducible, and optimized for scale and accuracy.
View Full Description & ApplyYou'll be redirected to the employer's site
80 - 100 USD per hour
Apply Now