HPC Support Engineer - Named Accounts

Posted 3 months agoViewed
USAFull-TimeAI Cloud
Company:Lambda
Location:USA
Languages:English
Seniority level:Lead, 7+ years
Experience:7+ years
Skills:
PythonKubernetesGrafanaPrometheusLinuxDevOpsTerraformDocumentationProblem SolvingCustomer serviceMentoringAnsibleTroubleshooting
Requirements:
7+ years of experience in HPC or cloud support engineering, with customer-facing responsibilities. Proven experience managing large-scale Linux clusters and distributed HPC/AI workloads. Deep expertise in orchestration tools such as Kubernetes and/or Slurm. Strong knowledge of GPU technologies (CUDA, NCCL, MIG, NVLink, GPUDirect RDMA). Skilled in high-throughput networking (InfiniBand, RoCE) and cluster storage solutions. Familiarity with monitoring/logging platforms (Prometheus, Grafana, Datadog). Experience leading incident management and communicating directly with enterprise or hyperscale customers. Ability to balance deep technical troubleshooting with clear, concise communication.
Responsibilities:
Act as the primary technical point of escalation for Super Intelligence customers running hyperscale GPU clusters. Lead incident response for complex issues, ensuring rapid triage, clear communication, and timely resolution. Proactively identify risks in large environments (firmware, performance bottlenecks, orchestration issues) and drive preventative improvements. Partner closely with Lambda Engineering and Product teams to influence roadmap decisions. Contribute to runbooks, best practices, and operational guides tailored for hyperscale environments. Train and mentor other support engineers. Participate in a rotating on-call schedule.
About the Company
Lambda
View Company Profile
Similar Jobs:
Posted 8 months ago
EMEA, AmericasFull-TimeSoftware Development
HPC Software Engineer
Company:Canonical
Posted 3 months ago
USAFull-TimeAI Cloud
Manager, Super Intelligence HPC Support
Company:Lambda
Posted 4 months ago
United StatesFull-TimeSoftware Development
Software Engineer II, Accounts