Platform Engineer – AI/ML Infrastructure

Posted 6 months agoViewed
United StatesFull-TimeVoice AI Platform
Company:Deepgram
Location:United States
Languages:English
Seniority level:Senior, 5+ years
Experience:5+ years
Skills:
AWSPythonBashKubernetesMachine LearningCI/CDLinuxDevOpsTerraform
Requirements:
5+ years of experience in Platform Engineering, DevOps, or Site Reliability Engineering (SRE). Proven, hands-on experience building and managing production infrastructure with Terraform. Expert-level knowledge of Kubernetes architecture and operations at scale. Experience with high-performance compute (HPC) job schedulers, specifically Slurm. Experience managing bare metal infrastructure, including server provisioning and lifecycle management. Strong scripting and automation skills (e.g., Python, Go, Bash).
Responsibilities:
Architect and maintain core computing platform using Kubernetes on AWS and on-premise. Develop and manage infrastructure using Infrastructure-as-Code (IaC) with Terraform. Design, build, and optimize AI/ML job scheduling and orchestration systems with Slurm. Provision, manage, and maintain on-premise bare metal server infrastructure for GPU computing. Implement and manage platform networking and storage solutions. Develop an observability stack and automation for operational tasks. Collaborate with AI researchers and ML engineers on infrastructure needs. Automate the lifecycle of single-tenant, managed deployments.
About the Company
Deepgram
51-100 employeesArtificial Intelligence (AI)
View Company Profile
Similar Jobs:
Posted about 2 months ago
United StatesFull-TimeAI/ML Platform
AI / ML Platform Engineer
Company:Whatnot
Posted 3 months ago
United StatesFull-TimeSoftware Development
Senior Infrastructure Engineer - AI/ML
Company:OpenTeams
Posted about 1 month ago
United StatesFull-TimeMachine Learning Platform
Staff ML/AI Platform Engineer