AI Evaluation Engineer - Agentic Coding / Software Engineering

New

Colombia. Egypt. Kenya. Ghana. Nigeria. Brazil, Bangladesh, India, Indonesia, Turkey, Vietnam, 4 hours with PSTContractMiddle

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

5+ years of experience in software engineering, QA, developer tooling, or similar code-heavy roles
Strong proficiency in at least one programming ecosystem (e.g., Python, JavaScript/TypeScript, Java, C/C++, Rust, SQL)
Ability to read and understand unfamiliar codebases and implement/debug changes
Experience running and interpreting tests, scripts, and CLI tools
Strong debugging and problem-solving skills, including handling edge cases
Comfortable working in Linux/terminal environments
Familiarity with Git workflows and standard development tooling
Experience with AI coding tools or agentic coding environments (e.g., Cursor, Claude Code, or similar)
Strong attention to detail and ability to produce consistent, high-quality evaluations

Execute coding tasks within agentic coding environments, maintaining strict evaluation protocols
Review and evaluate model-generated code trajectories for correctness and completeness
Validate outputs by reading code, running tests, analyzing logs, and inspecting artifacts
Perform targeted validation using scripts, tests, and manual checks
Write clear, evidence-based rationales for evaluations and rankings
Design realistic, multi-step coding tasks and workflows (offline work)
Create and refine evaluation rubrics and scoring criteria
Ensure consistency, quality, and compliance across evaluations
Identify issues in environments, instructions, or workflows and report with clear evidence

View Full Description & ApplyYou'll be redirected to the employer's site