AI Evaluation Engineer - Planning & Operations

New
G
Gramian Consulting GroupIT professional services
Pakistan. Egypt. Kenya. Ghana. Nigeria. Brazil, Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Indonesia, Kenya, Nigeria,Turkey, Vietnam, 4 hours with PSTContractMiddle
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
5+ years
Required Skills
DockerPython

Requirements

  • 5+ years of experience in operations, project management, logistics, or supply chain
  • Strong ability to formalize constraints, dependencies, and scheduling logic
  • Proficiency in Python for building validation and verification scripts
  • Experience with optimization techniques (linear programming, constraint satisfaction, scheduling algorithms)
  • Strong structured problem-solving and decomposition skills
  • Experience with AI benchmarks or evaluation frameworks (e.g., SWE-bench or similar)
  • Hands-on experience with Docker (Dockerfiles, image builds, debugging)

Responsibilities

  • Design and build multi-agent benchmark tasks involving: Planning, scheduling, and resource allocation
  • Operational decision-making (logistics, project planning, incident response, capacity planning)
  • Create constraint-rich problem statements with multiple interacting variables
  • Develop verification scripts to evaluate: Feasibility (all constraints satisfied)
  • Completeness (all requirements met)
  • Optimality (efficiency of solutions)
  • Define task decomposition strategies across specialized sub-agents (e.g., resource allocation, constraint resolution, optimization)
  • Model realistic operational systems with dependencies, timelines, and constraints
  • Implement validation logic and evaluation pipelines using Python
  • Work with Docker environments for reproducibility and execution
  • Collaborate with internal teams to improve task quality, coverage, and evaluation rigor
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now