Senior Research Engineer - Audio Post-Training

Posted about 1 month agoViewed

EuropeFull-TimeAI Video

Company:Synthesia

Location:Europe

Languages:English

Seniority level:Senior

Skills:

PythonArtificial IntelligenceMachine LearningPyTorchSoftware Engineering

Requirements:

Strong understanding of generative modelling, ideally applied to sequential or multimodal data. Hands-on experience with large language models (LLMs) or similar transformer-based architectures. High proficiency in PyTorch, including experience with distributed training and model optimization. Solid grasp of time-series modelling and tokenization, preferably in the context of audio or speech. Demonstrated ability to prototype quickly, test hypotheses, and iterate efficiently. Proven experience in training deep learning models end-to-end, from data preparation to evaluation. Strong general software engineering skills, enabling contributions to a large, shared research infrastructure.

Responsibilities:

Adapt models for new conditioning inputs (emotion, speed, prosody, speaker control). Fine-tune and optimize speech models using DPO, LoRA, and other parameter-efficient methods. Implement post-training optimization techniques (quantization, pruning, distillation). Integrate and test novel architectures (neural codecs, diffusion, flow-matching). Design and implement new evaluation metrics for TTS systems, including automated MOS prediction. Stay updated with latest research in audio diffusion, autoregressive models, neural codecs, and multimodal LLMs.