AI Benchmark Engineer | Native Language Specialist - Spanish

New

LILT (Production)AI, Language Technology

Spain (Remote)ContractMiddle

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

5+ years of industry experience in software engineering
Proven track record at leading technology companies and/or graduation from top-tier engineering universities
Native or near-native fluency in Spanish, with a deep understanding of its grammar, register, and phrasing rules
High English proficiency
Strong proficiency in Python
Strong proficiency in standard shell scripting
Strong proficiency in data processing
Extensive experience with Terminal/CLI-based development workflows
Working familiarity with coding agents
Deep technical understanding of multilingual text processing pitfalls, including encoding/decoding robustness and Unicode normalization
Deep technical understanding of locale-dependent conventions (collation, casing, non-Gregorian dates)
Deep technical understanding of text I/O, toolchain interoperability, and safe string operations
For Spanish, deep understanding of bidirectional/RTL handling, font fallbacks, and rendering/typography in UI or artifacts

Design, build, and validate Terminal-Bench tasks to test large language models on multilingual software challenges.
Create high-signal, high-quality tasks that genuinely test a model's ability to handle multilingual environments without relying on English translation crutches.
Evaluate coding agents.
Build realistic task environments using datasets and files in your native language (Spanish).
Find failure points where AI does not work, in your native language (Spanish).
Support the development of robust solutions (reference implementations) and write highly reliable, deterministic verifier scripts.
Analyze execution logs and calibrate task difficulty (Easy to Very Hard) using standard Terminal-Bench run configurations against various model tiers.
Participate in a rigorous, 4-layer human quality control process (creation, human review, calibration review, and audit) alongside automated LLM-based checks to ensure fairness, grammatical accuracy, and benchmark integrity.

View Full Description & ApplyYou'll be redirected to the employer's site