7+ years of experience in HPC or cloud support engineering, with customer-facing responsibilities. Proven experience managing large-scale Linux clusters and distributed HPC/AI workloads. Deep expertise in orchestration tools such as Kubernetes and/or Slurm. Strong knowledge of GPU technologies (CUDA, NCCL, MIG, NVLink, GPUDirect RDMA). Skilled in high-throughput networking (InfiniBand, RoCE) and cluster storage solutions. Familiarity with monitoring/logging platforms (Prometheus, Grafana, Datadog). Experience leading incident management and communicating directly with enterprise or hyperscale customers. Ability to balance deep technical troubleshooting with clear, concise communication.