AI/ML Evaluation Engineer - Global Solutions Provider (Colombia)

Posted about 2 months agoViewed
ColombiaFull-TimeAI/ML
Company:Truelogic
Location:Colombia
Languages:English
Skills:
PythonSQLArtificial IntelligenceMachine LearningNumpyQA AutomationPandasCI/CD
Requirements:
  • Advanced Python skills, including writing, debugging, and automating scripts.
  • Strong SQL proficiency and experience manipulating large datasets.
  • Hands-on experience with Python libraries such as Pandas and NumPy.
  • Ability to clean, standardize, and analyze structured and unstructured data.
  • Experience inspecting datasets, visualizing distributions, and preparing data for analysis.
  • Solid understanding of large language models, prompt behavior, hallucinations, and grounding concepts.
  • Knowledge of retrieval-augmented generation (RAG) flows and embedding-based search.
  • Awareness of vector similarity concepts such as cosine similarity and dot product.
  • Experience with at least one LLM evaluation framework (RAGAS, TruLens, LangSmith, etc.) or ability to quickly learn one.
  • Ability to design or implement custom LLM-as-Judge evaluation systems.
  • Applied understanding of statistical concepts such as variance, confidence intervals, precision/recall, and correlation.
  • Ability to translate ambiguous quality expectations into measurable metrics.
  • Familiarity with cloud-run services and automation pipelines, preferably on Google Cloud Platform (GCP).
  • Ability to learn new infrastructure tools quickly.
  • Strong analytical and problem-solving abilities for open-ended technical challenges.
  • Excellent communication skills for collaborating with cross-functional teams and presenting technical findings.
Responsibilities:
  • Write Python and SQL scripts to evaluate outputs from large language models (LLMs).
  • Design and implement LLM-as-Judge evaluations with clear scoring rubrics (faithfulness, relevance, completeness, correctness).
  • Define and calculate metrics such as exact match, token-level F1, ROUGE, cosine similarity, and subjective rubric scores.
  • Build and maintain ground-truth datasets for benchmarking and regression testing.
  • Automate evaluation workflows and integrate them into CI/CD pipelines.
  • Analyze large unstructured datasets to identify inconsistencies, anomalies, biases, and missing values.
  • Diagnose failure modes such as hallucinations, irrelevant answers, and formatting issues.
  • Produce clear reports summarizing evaluation findings and quality trends.
  • Collaborate with AI engineers, QA, data scientists, and product managers to define quality standards and release criteria.
  • Document all processes, evaluation setups, specifications, and architecture diagrams.
  • Maintain reproducibility and traceability for all evaluation runs and datasets.
About the Company
Truelogic
101-250 employeesConsulting
View Company Profile
Similar Jobs:
Posted about 2 months ago
Latin AmericaFull-TimeAI/ML Evaluation
AI/ML Evaluation Engineer - Global Solutions Provider
Company:Truelogic
Posted about 2 months ago
WorldwideContractSoftware Outsourcing
AI Solutions Engineer
Posted about 2 months ago
ColombiaFull-TimeGovTech Solutions
Software Architect (.Net) - GovTech Solutions (Colombia)
Company:Truelogic