AI/ML Evaluation Engineer - Global Solutions Provider (Mexico)

Posted about 2 months agoViewed
MexicoFull-TimeAI/ML
Company:Truelogic
Location:Mexico
Languages:English
Skills:
PythonSQLArtificial IntelligenceData AnalysisMachine LearningNumpyPandasCI/CD
Requirements:
  • Advanced Python skills, including writing, debugging, and automating scripts.
  • Strong SQL proficiency and experience manipulating large datasets.
  • Hands-on experience with Python libraries such as Pandas and NumPy.
  • Ability to clean, standardize, and analyze structured and unstructured data.
  • Experience inspecting datasets, visualizing distributions, and preparing data for analysis.
  • Solid understanding of large language models, prompt behavior, hallucinations, and grounding concepts.
  • Knowledge of retrieval-augmented generation (RAG) flows and embedding-based search.
  • Awareness of vector similarity concepts such as cosine similarity and dot product.
  • Experience with at least one LLM evaluation framework (RAGAS, TruLens, LangSmith, etc.) or ability to quickly learn one.
  • Ability to design or implement custom LLM-as-Judge evaluation systems.
  • Applied understanding of statistical concepts such as variance, confidence intervals, precision/recall, and correlation.
  • Ability to translate ambiguous quality expectations into measurable metrics.
  • Familiarity with cloud-run services and automation pipelines, preferably on Google Cloud Platform (GCP).
  • Ability to learn new infrastructure tools quickly.
  • Strong analytical and problem-solving abilities for open-ended technical challenges.
  • Excellent communication skills for collaborating with cross-functional teams and presenting technical findings.
Responsibilities:
  • Write Python and SQL scripts to evaluate LLM outputs.
  • Design and implement LLM-as-Judge evaluations with scoring rubrics.
  • Define and calculate metrics like exact match, token-level F1, ROUGE, cosine similarity, and rubric scores.
  • Build and maintain ground-truth datasets for benchmarking and regression testing.
  • Automate evaluation workflows and integrate them into CI/CD pipelines.
  • Analyze large unstructured datasets to identify inconsistencies, anomalies, biases, and missing values.
  • Diagnose failure modes such as hallucinations and irrelevant answers.
  • Produce clear reports summarizing evaluation findings and quality trends.
  • Collaborate with AI engineers, QA, data scientists, and product managers.
  • Document all processes, evaluation setups, specifications, and architecture diagrams.
  • Maintain reproducibility and traceability for all evaluation runs and datasets.
About the Company
Truelogic
101-250 employeesConsulting
View Company Profile
Similar Jobs:
Posted about 2 months ago
Latin AmericaFull-TimeAI/ML Evaluation
AI/ML Evaluation Engineer - Global Solutions Provider
Company:Truelogic
Posted about 2 months ago
Bulgaria, Lithuania, Mexico, Poland, RomaniaFull-TimeIdentity and Access Management
AI/ML Engineer (IAM Solutions)
Posted about 2 months ago
WorldwideContractSoftware Outsourcing
AI Solutions Engineer