Sr Engineer - Compute
New
Gurugram, HaryanaFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 5+ years
- Required Skills
- KubernetesGrafanaPrometheusLinux
Requirements
- Bachelor’s degree in Information Systems or related field (or equivalent specialized experience/training)
- 5+ years of advanced Linux administration and troubleshooting
- 5+ years managing RedHat OpenShift Kubernetes and Virtualization clusters
- 5+ years of expert level experience managing infrastructure in high-performance computing environments
- Experience with HPC schedulers (e.g., SLURM, Kubernetes, PBS, Run:ai)
- Proficiency in physical server environments
- Experience configuring, maintaining, and troubleshooting containers
- Experience with storage technology (e.g., Ceph or Vast Data Platform) and distributed file systems (e.g., Lustre, GPFS, NFS, GlusterFS)
- Experience with machine learning or data science workflows in HPC/AI environments
- 1+ years working with monitoring platforms (e.g., Prometheus, Grafana)
- 1+ years working with an enterprise ITSM system
- Managed Services or consulting experience
Responsibilities
- Provide enterprise-level operational support to Managed Services customers for incident, problem, and change management activities
- Plan and perform software and firmware maintenance activities
- Assess customer environments for performance and design issues and propose resolutions
- Work across technical teams to troubleshoot complex infrastructure issues
- Create and maintain detailed documentation
- Serve as a subject matter expert and escalation point for compute technologies
- Work with vendors to resolve compute issues
- Participate in on-call rotation
- Complete assigned training and certification
View Full Description & ApplyYou'll be redirected to the employer's site