AI Evaluation Engineer

New

Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Pakistan, Indonesia, Kenya, Nigeria, Turkey, Vietnam, minimum 4h PST overlapContractMiddle

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

3–10 years of experience in software engineering or related technical domains
Strong debugging, analytical, and systems reasoning skills
Good understanding of system architecture, dependencies, and operational processes
Experience with terminal, CLI, automation, or developer tooling workflows
Ability to design technically rigorous and realistic engineering scenarios
Background in backend engineering, infrastructure, DevOps, data systems, MLOps, cybersecurity, or platform engineering

Design realistic terminal-based benchmark tasks for AI evaluation systems
Create technically deep debugging and investigation scenarios
Develop task specifications involving infrastructure, workflows, pipelines, or operational failures
Write clear solution approaches and deterministic evaluation criteria
Identify realistic edge cases, failure modes, and system constraints
Design multi-step reasoning challenges across complex technical environments
Contribute expertise across one or more engineering or operational domains
Review and refine benchmark quality, difficulty, and validation logic
Collaborate with reviewers and researchers on AI evaluation workflows

View Full Description & ApplyYou'll be redirected to the employer's site