Advanced Python skills, including writing, debugging, and automating scripts. Strong SQL proficiency and experience manipulating large datasets. Hands-on experience with Python libraries such as Pandas and NumPy. Ability to clean, standardize, and analyze structured and unstructured data. Experience inspecting datasets, visualizing distributions, and preparing data for analysis. Solid understanding of large language models, prompt behavior, hallucinations, and grounding concepts. Knowledge of retrieval-augmented generation (RAG) flows and embedding-based search. Awareness of vector similarity concepts such as cosine similarity and dot product. Experience with at least one LLM evaluation framework (RAGAS, TruLens, LangSmith, etc.) or ability to quickly learn one. Ability to design or implement custom LLM-as-Judge evaluation systems. Applied understanding of statistical concepts such as variance, confidence intervals, precision/recall, and correlation. Ability to translate ambiguous quality expectations into measurable metrics. Familiarity with cloud-run services and automation pipelines, preferably on Google Cloud Platform (GCP). Ability to learn new infrastructure tools quickly. Strong analytical and problem-solving abilities for open-ended technical challenges. Excellent communication skills for collaborating with cross-functional teams and presenting technical findings.