Staff Research Engineer, Model Efficiency

New

CohereArtificial Intelligence

We have offices in Toronto, Montreal, San Francisco, New York, Paris, Seoul and London. We embrace a remote-friendly environment... You'll find the Model Efficiency team concentrated in the EST and PST time zones, these are our preferred locations., EST and PSTFull-TimeStaff

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Required Skills: Machine LearningSoftware EngineeringLLM

Requirements

PhD in Machine Learning or a related field.
Deep understanding of LLM architecture.
Experience optimizing LLM inference given resource constraints.
Significant experience with one or more techniques that enhance model efficiency.
Strong software engineering skills.
Ability to work in a fast-paced, high-ambiguity start-up environment.
Publications at top-tier conferences and venues (ICLR, ACL, NeurIPS).
Passion to mentor others.

Responsibilities

Develop, prototype, and deploy techniques that materially improve how fast and efficiently our models run in production.
Optimize model architecture and MoE routing.
Implement decoding and inference-time algorithm improvements.
Perform software/hardware co-design for GPU acceleration.
Execute performance optimization without compromising model quality.
Mentor other engineers.

View Full Description & ApplyYou'll be redirected to the employer's site