AI Benchmark Engineer - Native Language Specialist - Marathi

New
L
LILT (Production)AI, Language Technology
India (Remote)ContractMiddle
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Languages
Marathi, English
Experience
5+ years
Required Skills
Python

Requirements

  • 5+ years of industry experience in software engineering.
  • Proven track record at leading technology companies and/or graduation from top-tier engineering universities.
  • Native or near-native fluency in Marathi, with deep understanding of its grammar, register, and phrasing rules.
  • High English proficiency.
  • Strong proficiency in Python.
  • Strong proficiency in standard shell scripting.
  • Strong proficiency in data processing.
  • Extensive experience with Terminal/CLI-based development workflows.
  • Working familiarity with coding agents.
  • Deep technical understanding of multilingual text processing pitfalls.
  • Experience with encoding/decoding robustness and Unicode normalization.
  • Knowledge of locale-dependent conventions (collation, casing, non-Gregorian dates).
  • Experience with Text I/O, toolchain interoperability, and safe string operations.
  • Experience with Bidirectional/RTL handling, font fallbacks, and rendering/typography in UI or artifacts (for specific languages).

Responsibilities

  • Design, build, and validate evaluation suites of Terminal-Bench tasks.
  • Measure multilingual robustness across prompt language effects, non-English data processing, and complex locale/encoding edge cases.
  • Create high-signal, high-quality tasks testing model's ability to handle multilingual environments.
  • Evaluate Coding Agents through task engineering.
  • Build realistic task environments using datasets and files in native language (Marathi).
  • Identify AI failure points through prompting and translation in native language.
  • Support the development of robust solutions (reference implementations) and write reliable, deterministic verifier scripts.
  • Analyze execution logs and calibrate task difficulty (Easy to Very Hard) using standard Terminal-Bench run configurations.
  • Participate in a rigorous, 4-layer human quality control process (creation, human review, calibration review, and audit) alongside automated LLM-based checks.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now