Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field. 5 years of Software Engineering experience with 3+ of those years working with large ML datasets, especially those in open-source repositories such as Hugging Face. Strong programming skills in Go, Python, SQL, and at least one data pipeline framework (e.g., Airflow, Dagster, Prefect). Experience with data versioning tools (e.g., DVC, LakeFS) and cloud storage systems. Familiarity with machine learning workflows — from training data preparation to evaluation. Familiarity with the architecture and operation of large language models, and a nuanced understanding of their capabilities and limitations. Attention to detail and an obsession with data quality and reproducibility. Motivated by the Khan Academy mission. Proven cross-cultural competency skills.