Infrastructure Engineer
New
GermanyFull-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 5+ years
- Required Skills
- KubernetesLinuxTerraformNetworkingAnsible
Requirements
- 5+ years of experience in bare-metal, HPC/GPU, data-center, or systems infrastructure engineering.
- Hands-on ownership of physical and compute infrastructure.
- Strong proficiency in bare-metal Linux (RHEL, Rocky, Ubuntu) including firmware, BMC, PXE, and kernel/storage tuning.
- Solid understanding of networking and storage fundamentals.
- Demonstrable experience with the NVIDIA GPU stack (drivers, CUDA, GPU Operator, MIG, DCGM) in production.
- Experience serving GPU models in production environments.
- Comfortable operating in air-gapped or on-prem environments.
- Ability and willingness to travel to customer sites for builds and deployments.
- Methodical approach to hardware incidents and documentation.
Responsibilities
- Design, size, provision, and operate bare-metal GPU server fleets across on-prem and air-gapped environments.
- Manage the NVIDIA GPU stack end to end, including drivers, CUDA, GPU Operator, MIG, and DCGM.
- Build the bare-metal substrate for Kubernetes, including node lifecycle, container runtimes, and kernel/NUMA tuning.
- Engineer data-center networking and resilient storage systems such as Ceph, ZFS, and NVMe.
- Collaborate with ML and MLOps teams on on-prem inference serving using tools like Triton, KServe, and vLLM.
- Plan and execute on-site hardware build-outs, including rack integration, cooling/power sizing, and operator handovers.
View Full Description & ApplyYou'll be redirected to the employer's site