Member of Technical Staff, Model Efficiency

Posted 2 months agoViewed
United States, CanadaFulltimeAI, Machine Learning
Company:Cohere
Location:United States, Canada, EST, PST
Languages:English
Seniority level:Staff, 5+ years
Experience:5+ years
Skills:
PythonSoftware DevelopmentArtificial IntelligenceMachine LearningC++GoRust
Requirements:
5+ years of experience writing high-performance, production-quality code Strong programming skills in C++ or Python (Rust/Go also welcome) Experience working with large language models and familiarity with the LLM inference ecosystem (e.g., vLLM, SGLang) Ability to diagnose and resolve performance bottlenecks across the model execution stack Experience with GPU programming, CUDA, or low-level systems optimization (preferred) Experience with language modeling with transformers (MoE, speculative decoding, KV-cache optimizations) (preferred) Experience scaling performance-critical distributed systems (preferred)
Responsibilities:
Work across the inference stack to improve core performance metrics. Dive deep into model execution to identify bottlenecks. Develop innovative optimizations for LLM inference. Collaborate with modeling and systems teams to ship improvements. Build expertise in advanced performance techniques (GPU/CUDA, MoE, large-scale architectures).
About the Company
Cohere
251-500 employeesArtificial Intelligence (AI)
View Company Profile
Similar Jobs:
Posted about 1 year ago
Germany, USAGenerative image and video models
Member of Technical Staff - Large Model Data
Posted 2 months ago
New YorkFull-TimeAI Research
Staff Research Engineer, Model Efficiency
Company:Cohere
Posted about 1 year ago
Germany, USAGenerative image and video models
Member of Technical Staff - Model Serving / API Backend Engineer