Apply

Research Scientist - Voice AI Foundations

Posted 8 days agoViewed

View full description

🔍 Industry: Software Development

🏢 Company: Deepgram👥 51-100💰 $47,000,000 Series B about 2 years ago🫂 Last layoff over 1 year agoArtificial Intelligence (AI)Developer APIsData Collection and LabelingNatural Language ProcessingSpeech Recognition

🗣️ Languages: English

Requirements:
  • Strong mathematical foundation in statistical learning theory, particularly in areas relevant to self-supervised and multimodal learning
  • Deep expertise in foundation model architectures, with an understanding of how to scale training across multiple modalities
  • Proven ability to bridge theory and practice—someone who can both derive novel mathematical formulations and implement them efficiently
  • Demonstrated ability to build data pipelines that can process and curate massive datasets while maintaining quality and diversity
  • Track record of designing controlled experiments that isolate the impact of architectural innovations and validate theoretical insights
  • Experience optimizing models for real-world deployment, including knowledge of hardware constraints and efficiency techniques
  • History of open-source contributions or research publications that have advanced the state of the art in speech/language AI
Responsibilities:
  • Build next-generation neural audio codecs that achieve extreme, low bit-rate compression and high fidelity reconstruction across a world-scale corpus of general audio.
  • Pioneer steerable generative models that can synthesize the full diversity of human speech from the codec latent representation, from casual conversation to highly emotional expression to complex multi-speaker scenarios with environmental noise and overlapping speech.
  • Develop embedding systems that cleanly factorize the codec latent space into interpretable dimensions of speaker, content, style, environment, and channel effects -- enabling precise control over each aspect and the ability to massively amplify an existing seed dataset through “latent recombination”.
  • Leverage latent recombination to generate synthetic audio data at previously impossible scales, unlocking joint model and data scaling paradigms for audio. Endeavor to train multimodal speech-to-speech systems that can 1) understand any human irrespective of their demographics, state, or environment and 2) produce empathic, human-like responses that achieve conversational or task-oriented objectives.
  • Design model architectures, training schemes, and inference algorithms that are adapted for hardware at the bare metal enabling cost efficient training on billion-hour datasets and powering real-time inference for hundreds of millions of concurrent conversations.
Apply

Related Articles

Posted 6 months ago

Insights into the evolving landscape of remote work in 2024 reveal the importance of certifications and continuous learning. This article breaks down emerging trends, sought-after certifications, and provides practical solutions for enhancing your employability and expertise. What skills will be essential for remote job seekers, and how can you navigate this dynamic market to secure your dream role?

Posted 6 months ago

Explore the challenges and strategies of maintaining work-life balance while working remotely. Learn about unique aspects of remote work, associated challenges, historical context, and effective strategies to separate work and personal life.

Posted 6 months ago

Google is gearing up to expand its remote job listings, promising more opportunities across various departments and regions. Find out how this move can benefit job seekers and impact the market.

Posted 6 months ago

Learn about the importance of pre-onboarding preparation for remote employees, including checklist creation, documentation, tools and equipment setup, communication plans, and feedback strategies. Discover how proactive pre-onboarding can enhance job performance, increase retention rates, and foster a sense of belonging from day one.

Posted 6 months ago

The article explores the current statistics for remote work in 2024, covering the percentage of the global workforce working remotely, growth trends, popular industries and job roles, geographic distribution of remote workers, demographic trends, work models comparison, job satisfaction, and productivity insights.