AI Evaluation Engineer - Software Engineering / Code
New
G
Gramian Consulting GroupIT Professional Services
Colombia. Egypt. Kenya. Ghana. Nigeria. Brazil. Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Indonesia, Kenya, Nigeria,Turkey, Vietnam, overlap of 4 hours with PSTContractMiddle
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 5+ years
- Required Skills
- DockerNode.jsPythonDjangoFlaskGitJavascriptFastAPI
Requirements
- 5+ years of experience in software development
- Strong experience working with large codebases (e.g., Django, Flask, FastAPI, Node.js or similar)
- Familiarity with Git workflows (pull requests, diffs, commits, cherry-picking)
- Experience writing tests or validation scripts (pytest, unittest, or similar)
- Ability to write clear, precise technical specifications
- Familiarity with AI coding benchmarks or evaluation frameworks (e.g., SWE-bench or similar)
- Hands-on experience with Docker (Dockerfiles, image builds, debugging)
- Experience contributing to or maintaining open-source projects (Nice to Have)
- Experience with code migrations or large-scale refactoring (Nice to Have)
- Familiarity with CI/CD pipelines and automated testing workflows (Nice to Have)
- Exposure to LLM-based coding tools or evaluation frameworks (Nice to Have)
Responsibilities
- Design and build multi-agent benchmark tasks based on real-world code changes (bug fixes, migrations, refactors)
- Work with the Harbor evaluation framework to run and validate tasks in containerized environments
- Write clear, precise task instructions (file paths, function signatures, expected behavior, constraints)
- Develop Python-based verification scripts to validate correctness of code changes
- Define task decomposition strategies across multiple specialized agents
- Analyze and navigate large open-source codebases to extract realistic task scenarios
- Run, debug, and refine tasks in Docker environments to ensure reproducibility
- Improve task quality, clarity, and difficulty based on evaluation results
View Full Description & ApplyYou'll be redirected to the employer's site