Member of Technical Staff, Model Efficiency

Posted 2 months agoViewed

United States, CanadaFulltimeAI, Machine Learning

Company:Cohere

Location:United States, Canada, EST, PST

Languages:English

Seniority level:Staff, 5+ years

Experience:5+ years

Skills:

PythonSoftware DevelopmentArtificial IntelligenceMachine LearningC++GoRust

Requirements:

5+ years of experience writing high-performance, production-quality code Strong programming skills in C++ or Python (Rust/Go also welcome) Experience working with large language models and familiarity with the LLM inference ecosystem (e.g., vLLM, SGLang) Ability to diagnose and resolve performance bottlenecks across the model execution stack Experience with GPU programming, CUDA, or low-level systems optimization (preferred) Experience with language modeling with transformers (MoE, speculative decoding, KV-cache optimizations) (preferred) Experience scaling performance-critical distributed systems (preferred)

Responsibilities:

Work across the inference stack to improve core performance metrics. Dive deep into model execution to identify bottlenecks. Develop innovative optimizations for LLM inference. Collaborate with modeling and systems teams to ship improvements. Build expertise in advanced performance techniques (GPU/CUDA, MoE, large-scale architectures).