Lead Member of Technical Staff, Inference Infrastructure

New

CohereEnterprise AI

Location: San Francisco Secondary Locations: United States, New York, Toronto, MontrealFull-TimeLead

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

8+ years of engineering experience running production infrastructure at scale.
Demonstrated track record of technical leadership.
Deep expertise in Kubernetes development, production support, and establishing team standards.
Experience with GPU workloads and distributed systems architecture.
Extensive experience across GCP, Azure, AWS, OCI, and multi-cloud on-prem/hybrid environments.
Proficiency in Golang, C++, or other languages for high-performance scalable servers.
Strong expertise in the computational characteristics of accelerators (GPUs, TPUs, custom accelerators).
Proven ability to lead design, deployment, and troubleshooting of Linux-based computing environments.
Experience managing compute/storage/network resource and cost management at an organizational level.

Lead the design and strategy for deploying optimized NLP models to production.
Develop, deploy, and operate the AI platform delivering large language models via API endpoints.
Drive architecture for low-latency, high-throughput, and high-availability systems.
Provide technical leadership across multiple teams and mentor engineers to raise technical standards.
Serve as a key point of contact for customers to design customized deployment solutions.
Manage compute, storage, and network resources and costs at an organizational level.

View Full Description & ApplyYou'll be redirected to the employer's site