Senior AI Data Engineer

Canada or the USFull-TimeSenior
Salary110000 - 270000 USD per year
Apply NowOpens the employer's application page

Job Details

Required Skills
PythonCI/CDGenerative AI

Requirements

  • Strong, specialized understanding of data quality principles, including methods for validating datasets against bias, integrity concerns, and quality standards.
  • Ability to craft diverse and adversarial test data to uncover AI edge cases.
  • Demonstrated skill in advanced prompt engineering techniques to create evaluation scenarios that test the AI's reasoning, action planning, and adherence to system instructions.
  • Deep knowledge of LLM common failure modes (hallucination, incoherence, jailbreaking).
  • 5+ years of experience designing and deploying automated evaluation pipelines to assess complex, agentic AI behaviors.
  • Familiarity with quality metrics such as task success rate, semantic similarity, and sentiment analysis for output measurement.
  • Comfortable with the specific challenges of debugging agentic systems, including tracing and interpreting an agent's internal reasoning, tool use, and action sequence to pinpoint failure points.
  • 5+ years of experience using Python to develop custom evaluation frameworks, writing scripts, and integrating pipelines with CI/CD systems.
  • Familiarity with standard test automation tools (e.g., Pytest, modern web automation tools).
  • Bachelor's degree in Data Science, Machine Learning, Computer Science, or a related field, with experience in Gen AI / LLMs.
  • High work ethic.
  • High integrity and honesty.

Responsibilities

  • Define and establish comprehensive evaluation strategies for new AI Agents.
  • Prioritize the integrity and coverage of test data sets to reflect real-world usage and potential failure modes.
  • Programmatically and manually evaluate the quality of LLM-generated content against predefined metrics (e.g., factual accuracy, contextual relevance, coherence, and safety standards).
  • Design, curate, and generate diverse, high-quality test data sets, including challenging prompts and scenarios.
  • Evaluate LLM outputs to proactively identify system biases, unsafe content, hallucinations, and critical edge cases.
  • Develop, implement, and maintain scalable automated evaluations to ensure efficient, continuous validation of agent behavior and prevent regressions with new features and model updates.
  • Understand model behaviors and assist in the trace and root-cause analysis of identified defects or performance degradations.
  • Clearly document, track, and communicate performance metrics, validation results, and bug status to the broader development and product teams.
View Full Description & ApplyYou'll be redirected to the employer's site
110000 - 270000 USD per year
Apply Now