Senior AI Data Engineer
Canada or the USFull-TimeSenior
Salary110000 - 270000 USD per year
Apply NowOpens the employer's application page
Job Details
- Required Skills
- PythonCI/CDGenerative AI
Requirements
- Strong, specialized understanding of data quality principles, including methods for validating datasets against bias, integrity concerns, and quality standards.
- Ability to craft diverse and adversarial test data to uncover AI edge cases.
- Demonstrated skill in advanced prompt engineering techniques to create evaluation scenarios that test the AI's reasoning, action planning, and adherence to system instructions.
- Deep knowledge of LLM common failure modes (hallucination, incoherence, jailbreaking).
- 5+ years of experience designing and deploying automated evaluation pipelines to assess complex, agentic AI behaviors.
- Familiarity with quality metrics such as task success rate, semantic similarity, and sentiment analysis for output measurement.
- Comfortable with the specific challenges of debugging agentic systems, including tracing and interpreting an agent's internal reasoning, tool use, and action sequence to pinpoint failure points.
- 5+ years of experience using Python to develop custom evaluation frameworks, writing scripts, and integrating pipelines with CI/CD systems.
- Familiarity with standard test automation tools (e.g., Pytest, modern web automation tools).
- Bachelor's degree in Data Science, Machine Learning, Computer Science, or a related field, with experience in Gen AI / LLMs.
- High work ethic.
- High integrity and honesty.
Responsibilities
- Define and establish comprehensive evaluation strategies for new AI Agents.
- Prioritize the integrity and coverage of test data sets to reflect real-world usage and potential failure modes.
- Programmatically and manually evaluate the quality of LLM-generated content against predefined metrics (e.g., factual accuracy, contextual relevance, coherence, and safety standards).
- Design, curate, and generate diverse, high-quality test data sets, including challenging prompts and scenarios.
- Evaluate LLM outputs to proactively identify system biases, unsafe content, hallucinations, and critical edge cases.
- Develop, implement, and maintain scalable automated evaluations to ensure efficient, continuous validation of agent behavior and prevent regressions with new features and model updates.
- Understand model behaviors and assist in the trace and root-cause analysis of identified defects or performance degradations.
- Clearly document, track, and communicate performance metrics, validation results, and bug status to the broader development and product teams.
View Full Description & ApplyYou'll be redirected to the employer's site