Create structured, graduate-level tasks, prompts, and problem sets Assess AI-generated outputs for accuracy, depth, and rigor Pinpoint where models fail to reason correctly and provide expert corrections Maintain clear, organized records of tasks, solutions, and evaluations