AI Evaluation Engineer - Agentic Coding / Software Engineering

New
G
Gramian Consulting GroupIT Professional Services
Colombia. Egypt. Kenya. Ghana. Nigeria. Brazil, Bangladesh, India, Indonesia, Turkey, Vietnam, 4 hours with PSTContractMiddle
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
5+ years
Required Skills
PythonSQLGitJavaJavascriptTypeScriptC++RustLinux

Requirements

  • 5+ years of experience in software engineering, QA, developer tooling, or similar code-heavy roles
  • Strong proficiency in at least one programming ecosystem (e.g., Python, JavaScript/TypeScript, Java, C/C++, Rust, SQL)
  • Ability to read and understand unfamiliar codebases and implement/debug changes
  • Experience running and interpreting tests, scripts, and CLI tools
  • Strong debugging and problem-solving skills, including handling edge cases
  • Comfortable working in Linux/terminal environments
  • Familiarity with Git workflows and standard development tooling
  • Experience with AI coding tools or agentic coding environments (e.g., Cursor, Claude Code, or similar)
  • Strong attention to detail and ability to produce consistent, high-quality evaluations

Responsibilities

  • Execute coding tasks within agentic coding environments, maintaining strict evaluation protocols
  • Review and evaluate model-generated code trajectories for correctness and completeness
  • Validate outputs by reading code, running tests, analyzing logs, and inspecting artifacts
  • Perform targeted validation using scripts, tests, and manual checks
  • Write clear, evidence-based rationales for evaluations and rankings
  • Design realistic, multi-step coding tasks and workflows (offline work)
  • Create and refine evaluation rubrics and scoring criteria
  • Ensure consistency, quality, and compliance across evaluations
  • Identify issues in environments, instructions, or workflows and report with clear evidence
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now