AI Evaluation Engineer - Software Engineering / Code

New
G
Gramian Consulting GroupIT Professional Services
Colombia. Egypt. Kenya. Ghana. Nigeria. Brazil. Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Indonesia, Kenya, Nigeria,Turkey, Vietnam, overlap of 4 hours with PSTContractMiddle
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
5+ years
Required Skills
DockerNode.jsPythonDjangoFlaskGitJavascriptFastAPI

Requirements

  • 5+ years of experience in software development
  • Strong experience working with large codebases (e.g., Django, Flask, FastAPI, Node.js or similar)
  • Familiarity with Git workflows (pull requests, diffs, commits, cherry-picking)
  • Experience writing tests or validation scripts (pytest, unittest, or similar)
  • Ability to write clear, precise technical specifications
  • Familiarity with AI coding benchmarks or evaluation frameworks (e.g., SWE-bench or similar)
  • Hands-on experience with Docker (Dockerfiles, image builds, debugging)
  • Experience contributing to or maintaining open-source projects (Nice to Have)
  • Experience with code migrations or large-scale refactoring (Nice to Have)
  • Familiarity with CI/CD pipelines and automated testing workflows (Nice to Have)
  • Exposure to LLM-based coding tools or evaluation frameworks (Nice to Have)

Responsibilities

  • Design and build multi-agent benchmark tasks based on real-world code changes (bug fixes, migrations, refactors)
  • Work with the Harbor evaluation framework to run and validate tasks in containerized environments
  • Write clear, precise task instructions (file paths, function signatures, expected behavior, constraints)
  • Develop Python-based verification scripts to validate correctness of code changes
  • Define task decomposition strategies across multiple specialized agents
  • Analyze and navigate large open-source codebases to extract realistic task scenarios
  • Run, debug, and refine tasks in Docker environments to ensure reproducibility
  • Improve task quality, clarity, and difficulty based on evaluation results
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now