- Conduct multi-turn conversations with AI models and compare response quality across variants
- Evaluate outputs for accuracy, grounding, and personalization quality
- Assess safety and responsible data usage
- Test predefined health and wellness scenarios including fitness, activity tracking, nutrition, and habits
- Identify issues such as incorrect personalization, hallucinations, reasoning gaps, or data privacy risks
- Provide clear, structured feedback and evaluation notes
- Maintain consistency and attention to detail across repeated evaluations