AI Evaluation Engineer - Mathematics & Algorithms

New
G
Gramian Consulting GroupIT Professional Services
Pakistan. Egypt. Kenya. Ghana. Nigeria. Brazil Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Indonesia, Kenya, Nigeria,Turkey, Vietnam, 4 hours with PSTContractMiddle
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
5+ years
Required Skills
DockerPythonNumpy

Requirements

  • 5+ years in mathematics, quantitative research, or computational science
  • Competition math, university-level mathematics, or quantitative research background
  • Python programming
  • NumPy, SciPy, or symbolic computation (SymPy)
  • Experience writing mathematical proofs or formal derivations
  • Ability to create problems with precise, verifiable answers
  • Experience with AI coding benchmarks (SWE-bench, Terminal-bench)
  • Comfortable with Docker
  • Writing Dockerfiles, building images, and debugging container issues
  • Understanding of numerical methods
  • Floating point tolerance, convergence criteria, error bounds

Responsibilities

  • Design and build multi-agent benchmark tasks requiring multi-step mathematical reasoning and algorithmic problem-solving
  • Create complex, decomposable problems across domains such as Competition mathematics, Numerical analysis, Combinatorial optimization, Statistical inference
  • Develop verification scripts to validate: Numerical outputs (with tolerance thresholds), Proof correctness and logical steps, Algorithmic outputs and constraints
  • Write clear, structured problem statements with precise notation and defined outputs
  • Design task decomposition strategies for parallel or multi-agent execution
  • Implement computational solutions and validation pipelines using Python
  • Work with containerized environments (Docker) for reproducibility and evaluation
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now