Proven experience with flow matching, diffusion models, auto regressive networks in the audio domain Experience training deep learning models (medium-sized to large) Experience building streaming text-to-speech models or speech-to-speech models Strong foundations in audio modeling and ability to innovate rapidly through prototyping Knowledge of state-of-the-art architectures in representation learning (audio or image domain, face animation) Excellent programming skills and fluency in PyTorch Evidence of original research with publications in top-tier or solid second-tier venues Excited about building lifelike, expressive avatars for real-time applications