Adapt models for new conditioning inputs (emotion, speed, prosody, speaker control, etc.) Develop and evaluate streaming and speech-to-speech systems for low-latency voice synthesis Implement post-training optimization techniques (quantization, pruning, distillation) Integrate and test novel architectures (neural codecs, diffusion, flow-matching models) Contribute to defining new evaluation metrics for conversational speech Stay updated with the latest research in audio diffusion, autoregressive models, neural codecs, and multimodal LLMs Apply DPO and distillation to fine-tune large-scale speech models