AI Benchmark Engineer - Native Language Specialist

IndiaContractMiddle
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Languages
Native or near-native fluency in a language other than English, High English proficiency
Experience
5+ years
Required Skills
Python

Requirements

  • 5+ years of professional experience in software engineering or a related technical field.
  • Background working with leading technology companies or strong academic foundation from top-tier engineering institutions.
  • Native or near-native fluency in a language other than English, with deep understanding of grammar, structure, and contextual usage.
  • Strong proficiency in Python, shell scripting, and data processing workflows.
  • Hands-on experience with CLI/terminal-based development environments and familiarity with coding agents or AI-assisted tools.
  • Strong understanding of multilingual computing challenges, including Unicode handling, encoding/decoding, and locale-specific behaviors.
  • Knowledge of text processing edge cases such as bidirectional scripts, collation rules, non-Gregorian formats, and rendering constraints.
  • High English proficiency for collaboration, documentation, and technical communication.

Responsibilities

  • Design and engineer high-quality Terminal-Bench tasks to evaluate multilingual performance of AI coding agents in realistic environments.
  • Create and maintain multilingual datasets and file-based assets in your native language, ensuring linguistic integrity without translation simplification.
  • Identify AI failure points in non-English prompts and workflows, and design challenges that rigorously test robustness.
  • Develop reference implementations and deterministic verifier scripts to ensure reliable and reproducible evaluation outputs.
  • Calibrate task difficulty levels (Easy to Very Hard) based on model performance analysis and execution logs across different AI tiers.
  • Participate in structured multi-layer quality assurance processes, including creation review, calibration validation, and audit checks.
  • Ensure benchmark fairness, grammatical accuracy, and technical integrity through both manual review and automated validation systems.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now