AI & HPC Infrastructure Engineer
F
FirstPrinciplesAI Infrastructure
Working across Canada, the US, the UK, and expanding globallyFull-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Required Skills
- Cloud ComputingKubernetesLinuxTerraformAnsible
Requirements
- Strong infrastructure background in production, research, cloud, or HPC systems
- Deep Linux administration expertise
- Experienced with Kubernetes cluster operations
- Proficiency with cloud infrastructure (AWS, GCP, or Azure)
- Experience with infrastructure automation tools (Terraform, Ansible, Helm, ArgoCD, GitOps)
- Experience with GPU-heavy or HPC-style workloads
- Ability to work across bare metal and cloud environments
- Strong collaboration skills for working with research and engineering teams
- Capability to operate independently as a senior or strong intermediate contributor
Responsibilities
- Design, deploy, and operate Kubernetes infrastructure for AI inference, research, and engineering workloads
- Set up and manage GPU and HPC-style compute environments
- Build and manage Linux-based compute environments
- Help architect bare metal, cloud, and hybrid infrastructure
- Own the reliability and operational health of infrastructure systems
- Improve deployment workflows, automation, and infrastructure-as-code practices
- Partner with ML engineers and researchers to translate workload requirements into designs
- Build tooling, documentation, and runbooks
View Full Description & ApplyYou'll be redirected to the employer's site