Sr Engineer - Compute

New
Gurugram, HaryanaFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
5+ years
Required Skills
KubernetesGrafanaPrometheusLinux

Requirements

  • Bachelor’s degree in Information Systems or related field (or equivalent specialized experience/training)
  • 5+ years of advanced Linux administration and troubleshooting
  • 5+ years managing RedHat OpenShift Kubernetes and Virtualization clusters
  • 5+ years of expert level experience managing infrastructure in high-performance computing environments
  • Experience with HPC schedulers (e.g., SLURM, Kubernetes, PBS, Run:ai)
  • Proficiency in physical server environments
  • Experience configuring, maintaining, and troubleshooting containers
  • Experience with storage technology (e.g., Ceph or Vast Data Platform) and distributed file systems (e.g., Lustre, GPFS, NFS, GlusterFS)
  • Experience with machine learning or data science workflows in HPC/AI environments
  • 1+ years working with monitoring platforms (e.g., Prometheus, Grafana)
  • 1+ years working with an enterprise ITSM system
  • Managed Services or consulting experience

Responsibilities

  • Provide enterprise-level operational support to Managed Services customers for incident, problem, and change management activities
  • Plan and perform software and firmware maintenance activities
  • Assess customer environments for performance and design issues and propose resolutions
  • Work across technical teams to troubleshoot complex infrastructure issues
  • Create and maintain detailed documentation
  • Serve as a subject matter expert and escalation point for compute technologies
  • Work with vendors to resolve compute issues
  • Participate in on-call rotation
  • Complete assigned training and certification
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now