Site Reliability Engineer II

New
Location: India ( Remote )Full-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
5-8 years of experience as a Site Reliability Engineer, Platform Engineer, or DevOps Engineer.
Required Skills
PythonKubernetesGoGrafanaPrometheusTerraformHelm

Requirements

  • 5-8 years of experience as a Site Reliability Engineer, Platform Engineer, or DevOps Engineer.
  • Hands-on experience managing Kubernetes clusters (GKE, EKS) in GCP and AWS.
  • Strong knowledge of Terraform, Helm, and GitLab CI/CD pipelines.
  • Proficiency in Python, Go, or Shell scripting for automation and tooling.
  • Experience implementing and managing observability stacks including Prometheus, Grafana, and Datadog.
  • Deep understanding of Linux systems and cloud networking concepts.
  • Experience with container orchestration.
  • Experience working in Agile/Scrum environments.
  • Excellent analytical and proactive problem-solving skills.

Responsibilities

  • Collaborate with Developers, QA, and Product teams on release planning and infrastructure requirements.
  • Participate in the application release cycle to ensure consistent and automated deployments.
  • Manage and operate Kubernetes clusters in Google Kubernetes Engine (GKE) and Amazon Elastic Kubernetes Service (EKS).
  • Develop and maintain Terraform modules for cloud infrastructure provisioning on GCP and AWS.
  • Standardize service deployments using Helm for versioned releases.
  • Enhance system observability using Prometheus, Grafana, and Datadog.
  • Design and maintain GitLab CI/CD pipelines for automated testing and deployment.
  • Develop automation scripts and tooling using Python, Go, or Shell.
  • Participate in a 24/7 on-call rotation for incident management and resolution.
  • Perform root cause analysis and post-incident reviews to identify and address systemic risks.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now