- Partner with AI product, engineering, and domain experts to define metrics for flagship AI journeys
- Align on test design, interpretation, and causal framing for new AI products
- Map agent data interactions to ensure structured and testable requirements
- Design and maintain evaluation sets, golden datasets, and regression checks
- Categorize failure modes and feed insights back into prompts, RAG, and data pipelines
- Monitor live usage of Agentic AI workflows and manage engineering improvement backlogs
- Own internal playbooks regarding AI agent usage and human checkpoints
- Deliver recurring measurement artifacts to inform leadership roadmaps
PythonProduct AnalyticsData science+1 more