Senior Research Engineer, Evaluations (Speech-to-Text)

A
AssemblyAISpeech AI
Eastern US Time ZoneFull-TimeSenior
Salary210000 - 260000 USD per year
Apply NowOpens the employer's application page

Job Details

Required Skills
PythonSQLMachine LearningLLM

Requirements

  • ML fundamentals: Understand how ML models are trained and evaluated well enough to interpret results and debug issues
  • Strong Python skills: Write clean evaluation scripts, work with data pipelines
  • Comfortable with SQL
  • Comfortable with cloud infrastructure
  • Metric intuition: Understand what makes a good evaluation metric, when to use relative vs. absolute improvements, and how to ensure statistical rigor
  • Voice agent stack familiarity: Understand how the components of a voice agent system interact (VAD, ASR, turn detection, LLM, TTS)
  • Tinkerer mentality: Ship something rough and iterate rather than spending weeks perfecting it
  • Communication skills: Explain technical results to researchers, summarize findings for leadership, and translate customer feedback into requirements
  • Ownership mindset: See gaps and fill them without waiting to be told what to evaluate
  • Work at least 3-4 hours overlapping with Eastern US Time Zone

Responsibilities

  • Own end-to-end and integration-level model evaluation across accuracy, latency, and feature-specific metrics (e.g., turn detection latency, endpointing accuracy)
  • Build and maintain competitive benchmarking pipelines against other providers in the market
  • Design and run systematic experiments to measure the impact of model changes
  • Onboard, curate, and maintain evaluation datasets—both public benchmarks and internal test sets
  • Create evaluation subsets that stress-test specific capabilities and edge cases
  • Define evaluation metrics that capture real-world performance
  • Translate qualitative customer feedback into quantifiable evaluation criteria
  • Work with customer-facing teams to understand pain points and convert them into research priorities
  • Reduce friction for researchers by maintaining clean evaluation pipelines and clear documentation
  • Identify evaluation gaps proactively and propose solutions
  • Move fast—iterate on benchmarking approaches weekly, not monthly
View Full Description & ApplyYou'll be redirected to the employer's site
210000 - 260000 USD per year
Apply Now