Staff Applied Researcher, AI Quality

New

Work from anywhere within the United StatesFull-TimeStaff

Salary140,400 - 372,300 USD per year

Apply NowOpens the employer's application page

Job Details

Experience: 4–8 years
Required Skills: PythonMachine LearningTypeScriptData scienceSoftware Engineering

Bachelor’s, Master’s, or PhD degree in Computer Science, Data Science, Mathematics, Statistics, Physics, Economics, Operations Research, or a related technical field, or equivalent practical experience.
Minimum of 4–8 years of experience in data science, machine learning, applied research, or related technical fields depending on educational background.
Strong software engineering expertise in Python and/or TypeScript, with experience building scalable ML, data, or evaluation pipelines in production environments.
Proven experience delivering research systems or AI evaluation frameworks in real-world production settings.
Deep understanding of large language model evaluation, alignment, reward modeling, safety assessments, or AI quality methodologies.
Experience with large-scale experimentation, benchmarking strategies, and online/offline model evaluation techniques.
Strong communication and cross-functional collaboration skills, with the ability to influence technical and product decisions.
Experience with developer tools, AI-assisted programming, or code generation systems is highly preferred.
Open-source contributions or experience engaging with developer communities is considered a strong advantage.

Design and implement advanced evaluation frameworks for large language models, including code generation, reasoning, multimodal capabilities, safety, and agentic workflows.
Develop scalable evaluation methodologies such as automated metrics, reward models, LLM-judge systems, and human-in-the-loop evaluation pipelines.
Build and optimize benchmarking systems, datasets, experimentation pipelines, and production-grade ML evaluation tooling.
Collaborate closely with engineering, product, and design teams to integrate research findings into practical AI-powered applications and product experiences.
Lead initiatives focused on improving model quality, alignment, and performance across AI systems and developer tools.
Drive the onboarding and creation of challenging benchmarks for coding agents and advanced AI workflows.
Mentor researchers and engineers, promoting high technical standards, innovation, and effective execution practices.
Provide strategic guidance in ambiguous problem spaces and contribute to long-term AI quality and evaluation strategies.

View Full Description & ApplyYou'll be redirected to the employer's site