Staff Applied Researcher, AI Quality
New
Work from anywhere within the United StatesFull-TimeStaff
Salary140,400 - 372,300 USD per year
Apply NowOpens the employer's application page
Job Details
- Experience
- 4–8 years
- Required Skills
- PythonMachine LearningTypeScriptData scienceSoftware Engineering
Requirements
- Bachelor’s, Master’s, or PhD degree in Computer Science, Data Science, Mathematics, Statistics, Physics, Economics, Operations Research, or a related technical field, or equivalent practical experience.
- Minimum of 4–8 years of experience in data science, machine learning, applied research, or related technical fields depending on educational background.
- Strong software engineering expertise in Python and/or TypeScript, with experience building scalable ML, data, or evaluation pipelines in production environments.
- Proven experience delivering research systems or AI evaluation frameworks in real-world production settings.
- Deep understanding of large language model evaluation, alignment, reward modeling, safety assessments, or AI quality methodologies.
- Experience with large-scale experimentation, benchmarking strategies, and online/offline model evaluation techniques.
- Strong communication and cross-functional collaboration skills, with the ability to influence technical and product decisions.
- Experience with developer tools, AI-assisted programming, or code generation systems is highly preferred.
- Open-source contributions or experience engaging with developer communities is considered a strong advantage.
Responsibilities
- Design and implement advanced evaluation frameworks for large language models, including code generation, reasoning, multimodal capabilities, safety, and agentic workflows.
- Develop scalable evaluation methodologies such as automated metrics, reward models, LLM-judge systems, and human-in-the-loop evaluation pipelines.
- Build and optimize benchmarking systems, datasets, experimentation pipelines, and production-grade ML evaluation tooling.
- Collaborate closely with engineering, product, and design teams to integrate research findings into practical AI-powered applications and product experiences.
- Lead initiatives focused on improving model quality, alignment, and performance across AI systems and developer tools.
- Drive the onboarding and creation of challenging benchmarks for coding agents and advanced AI workflows.
- Mentor researchers and engineers, promoting high technical standards, innovation, and effective execution practices.
- Provide strategic guidance in ambiguous problem spaces and contribute to long-term AI quality and evaluation strategies.
View Full Description & ApplyYou'll be redirected to the employer's site