AI Evaluation Engineer - Agentic Coding / Software Engineering
New
G
Gramian Consulting GroupIT Professional Services
Colombia. Egypt. Kenya. Ghana. Nigeria. Brazil, Bangladesh, India, Indonesia, Turkey, Vietnam, 4 hours with PSTContractMiddle
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 5+ years
- Required Skills
- PythonSQLGitJavaJavascriptTypeScriptC++RustLinux
Requirements
- 5+ years of experience in software engineering, QA, developer tooling, or similar code-heavy roles
- Strong proficiency in at least one programming ecosystem (e.g., Python, JavaScript/TypeScript, Java, C/C++, Rust, SQL)
- Ability to read and understand unfamiliar codebases and implement/debug changes
- Experience running and interpreting tests, scripts, and CLI tools
- Strong debugging and problem-solving skills, including handling edge cases
- Comfortable working in Linux/terminal environments
- Familiarity with Git workflows and standard development tooling
- Experience with AI coding tools or agentic coding environments (e.g., Cursor, Claude Code, or similar)
- Strong attention to detail and ability to produce consistent, high-quality evaluations
Responsibilities
- Execute coding tasks within agentic coding environments, maintaining strict evaluation protocols
- Review and evaluate model-generated code trajectories for correctness and completeness
- Validate outputs by reading code, running tests, analyzing logs, and inspecting artifacts
- Perform targeted validation using scripts, tests, and manual checks
- Write clear, evidence-based rationales for evaluations and rankings
- Design realistic, multi-step coding tasks and workflows (offline work)
- Create and refine evaluation rubrics and scoring criteria
- Ensure consistency, quality, and compliance across evaluations
- Identify issues in environments, instructions, or workflows and report with clear evidence
View Full Description & ApplyYou'll be redirected to the employer's site