Lead Member of Technical Staff, Inference Infrastructure
New
C
CohereEnterprise AI
Location: San Francisco Secondary Locations: United States, New York, Toronto, MontrealFull-TimeLead
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 8+ years
- Required Skills
- AWSGCPKubernetesC++AzureGoLinuxDistributed Systems
Requirements
- 8+ years of engineering experience running production infrastructure at scale.
- Demonstrated track record of technical leadership.
- Deep expertise in Kubernetes development, production support, and establishing team standards.
- Experience with GPU workloads and distributed systems architecture.
- Extensive experience across GCP, Azure, AWS, OCI, and multi-cloud on-prem/hybrid environments.
- Proficiency in Golang, C++, or other languages for high-performance scalable servers.
- Strong expertise in the computational characteristics of accelerators (GPUs, TPUs, custom accelerators).
- Proven ability to lead design, deployment, and troubleshooting of Linux-based computing environments.
- Experience managing compute/storage/network resource and cost management at an organizational level.
Responsibilities
- Lead the design and strategy for deploying optimized NLP models to production.
- Develop, deploy, and operate the AI platform delivering large language models via API endpoints.
- Drive architecture for low-latency, high-throughput, and high-availability systems.
- Provide technical leadership across multiple teams and mentor engineers to raise technical standards.
- Serve as a key point of contact for customers to design customized deployment solutions.
- Manage compute, storage, and network resources and costs at an organizational level.
View Full Description & ApplyYou'll be redirected to the employer's site