Proven experience designing evaluation systems for agentic or LLM-based AI Deep expertise in statistical experimentation, benchmark creation, and human-AI interaction assessment Fluency in building data pipelines and tooling using Python, SQL, and distributed data processing frameworks Demonstrated ability to influence product and model roadmaps Adaptive-level proficiency in integrating AI tools into technical workflows