AI & HPC Infrastructure Engineer

F
FirstPrinciplesAI Infrastructure
Working across Canada, the US, the UK, and expanding globallyFull-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Required Skills
Cloud ComputingKubernetesLinuxTerraformAnsible

Requirements

  • Strong infrastructure background in production, research, cloud, or HPC systems
  • Deep Linux administration expertise
  • Experienced with Kubernetes cluster operations
  • Proficiency with cloud infrastructure (AWS, GCP, or Azure)
  • Experience with infrastructure automation tools (Terraform, Ansible, Helm, ArgoCD, GitOps)
  • Experience with GPU-heavy or HPC-style workloads
  • Ability to work across bare metal and cloud environments
  • Strong collaboration skills for working with research and engineering teams
  • Capability to operate independently as a senior or strong intermediate contributor

Responsibilities

  • Design, deploy, and operate Kubernetes infrastructure for AI inference, research, and engineering workloads
  • Set up and manage GPU and HPC-style compute environments
  • Build and manage Linux-based compute environments
  • Help architect bare metal, cloud, and hybrid infrastructure
  • Own the reliability and operational health of infrastructure systems
  • Improve deployment workflows, automation, and infrastructure-as-code practices
  • Partner with ML engineers and researchers to translate workload requirements into designs
  • Build tooling, documentation, and runbooks
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now