AI Evaluation Engineer - Planning & Operations
New
G
Gramian Consulting GroupIT professional services
Pakistan. Egypt. Kenya. Ghana. Nigeria. Brazil, Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Indonesia, Kenya, Nigeria,Turkey, Vietnam, 4 hours with PSTContractMiddle
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 5+ years
- Required Skills
- DockerPython
Requirements
- 5+ years of experience in operations, project management, logistics, or supply chain
- Strong ability to formalize constraints, dependencies, and scheduling logic
- Proficiency in Python for building validation and verification scripts
- Experience with optimization techniques (linear programming, constraint satisfaction, scheduling algorithms)
- Strong structured problem-solving and decomposition skills
- Experience with AI benchmarks or evaluation frameworks (e.g., SWE-bench or similar)
- Hands-on experience with Docker (Dockerfiles, image builds, debugging)
Responsibilities
- Design and build multi-agent benchmark tasks involving: Planning, scheduling, and resource allocation
- Operational decision-making (logistics, project planning, incident response, capacity planning)
- Create constraint-rich problem statements with multiple interacting variables
- Develop verification scripts to evaluate: Feasibility (all constraints satisfied)
- Completeness (all requirements met)
- Optimality (efficiency of solutions)
- Define task decomposition strategies across specialized sub-agents (e.g., resource allocation, constraint resolution, optimization)
- Model realistic operational systems with dependencies, timelines, and constraints
- Implement validation logic and evaluation pipelines using Python
- Work with Docker environments for reproducibility and execution
- Collaborate with internal teams to improve task quality, coverage, and evaluation rigor
View Full Description & ApplyYou'll be redirected to the employer's site