You need to enable JavaScript to run this app.
remoote.app
Work from anywhere. Start here.
Remoote.app
Work from anywhere. Start here.
Jobs
IT Jobs
Free resume builder
Companies
Log in
Home page
Jobs
AI/ML Evaluation Engineer - Global Solutions Provider (Colombia)
AI/ML Evaluation Engineer - Global Solutions Provider (Colombia)
Posted about 2 months ago
Viewed
Colombia
Full-Time
AI/ML
Company:
Truelogic
Location:
Colombia
Languages:
English
Skills:
Python
SQL
Artificial Intelligence
Machine Learning
Numpy
QA Automation
Pandas
CI/CD
Apply
Requirements:
Advanced Python skills, including writing, debugging, and automating scripts.
Strong SQL proficiency and experience manipulating large datasets.
Hands-on experience with Python libraries such as Pandas and NumPy.
Ability to clean, standardize, and analyze structured and unstructured data.
Experience inspecting datasets, visualizing distributions, and preparing data for analysis.
Solid understanding of large language models, prompt behavior, hallucinations, and grounding concepts.
Knowledge of retrieval-augmented generation (RAG) flows and embedding-based search.
Awareness of vector similarity concepts such as cosine similarity and dot product.
Experience with at least one LLM evaluation framework (RAGAS, TruLens, LangSmith, etc.) or ability to quickly learn one.
Ability to design or implement custom LLM-as-Judge evaluation systems.
Applied understanding of statistical concepts such as variance, confidence intervals, precision/recall, and correlation.
Ability to translate ambiguous quality expectations into measurable metrics.
Familiarity with cloud-run services and automation pipelines, preferably on Google Cloud Platform (GCP).
Ability to learn new infrastructure tools quickly.
Strong analytical and problem-solving abilities for open-ended technical challenges.
Excellent communication skills for collaborating with cross-functional teams and presenting technical findings.
Responsibilities:
Write Python and SQL scripts to evaluate outputs from large language models (LLMs).
Design and implement LLM-as-Judge evaluations with clear scoring rubrics (faithfulness, relevance, completeness, correctness).
Define and calculate metrics such as exact match, token-level F1, ROUGE, cosine similarity, and subjective rubric scores.
Build and maintain ground-truth datasets for benchmarking and regression testing.
Automate evaluation workflows and integrate them into CI/CD pipelines.
Analyze large unstructured datasets to identify inconsistencies, anomalies, biases, and missing values.
Diagnose failure modes such as hallucinations, irrelevant answers, and formatting issues.
Produce clear reports summarizing evaluation findings and quality trends.
Collaborate with AI engineers, QA, data scientists, and product managers to define quality standards and release criteria.
Document all processes, evaluation setups, specifications, and architecture diagrams.
Maintain reproducibility and traceability for all evaluation runs and datasets.
View full description
About the Company
Truelogic
101-250 employees
Consulting
View Company Profile
Similar Jobs:
Posted about 2 months ago
Latin America
Full-Time
AI/ML Evaluation
AI/ML Evaluation Engineer - Global Solutions Provider
Company:
Truelogic
Posted about 2 months ago
Worldwide
Contract
Software Outsourcing
AI Solutions Engineer
Company:
Teravision Technologies
Posted about 2 months ago
Colombia
Full-Time
GovTech Solutions
Software Architect (.Net) - GovTech Solutions (Colombia)
Company:
Truelogic
Similar Jobs
Colombia
Fulltime
Software Development
Posted 2 months ago
Lead Fullstack Engineer (.NET/Angular) – GovTech Solutions (Colombia)
Company:
Truelogic
(101-250 employees, Consulting, Web Development, Web Design)
Requirements
Responsibilities
AWS
Backend Development
Leadership
+13 more