Lead Member of Technical Staff, Inference Infrastructure

New
C
CohereEnterprise AI
Location: San Francisco Secondary Locations: United States, New York, Toronto, MontrealFull-TimeLead
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
8+ years
Required Skills
AWSGCPKubernetesC++AzureGoLinuxDistributed Systems

Requirements

  • 8+ years of engineering experience running production infrastructure at scale.
  • Demonstrated track record of technical leadership.
  • Deep expertise in Kubernetes development, production support, and establishing team standards.
  • Experience with GPU workloads and distributed systems architecture.
  • Extensive experience across GCP, Azure, AWS, OCI, and multi-cloud on-prem/hybrid environments.
  • Proficiency in Golang, C++, or other languages for high-performance scalable servers.
  • Strong expertise in the computational characteristics of accelerators (GPUs, TPUs, custom accelerators).
  • Proven ability to lead design, deployment, and troubleshooting of Linux-based computing environments.
  • Experience managing compute/storage/network resource and cost management at an organizational level.

Responsibilities

  • Lead the design and strategy for deploying optimized NLP models to production.
  • Develop, deploy, and operate the AI platform delivering large language models via API endpoints.
  • Drive architecture for low-latency, high-throughput, and high-availability systems.
  • Provide technical leadership across multiple teams and mentor engineers to raise technical standards.
  • Serve as a key point of contact for customers to design customized deployment solutions.
  • Manage compute, storage, and network resources and costs at an organizational level.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now