Site Reliability Engineer II

New

Location: India ( Remote )Full-TimeMiddle

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Experience: 5-8 years of experience as a Site Reliability Engineer, Platform Engineer, or DevOps Engineer.
Required Skills: PythonKubernetesGoGrafanaPrometheusTerraformHelm

5-8 years of experience as a Site Reliability Engineer, Platform Engineer, or DevOps Engineer.
Hands-on experience managing Kubernetes clusters (GKE, EKS) in GCP and AWS.
Strong knowledge of Terraform, Helm, and GitLab CI/CD pipelines.
Proficiency in Python, Go, or Shell scripting for automation and tooling.
Experience implementing and managing observability stacks including Prometheus, Grafana, and Datadog.
Deep understanding of Linux systems and cloud networking concepts.
Experience with container orchestration.
Experience working in Agile/Scrum environments.
Excellent analytical and proactive problem-solving skills.

Collaborate with Developers, QA, and Product teams on release planning and infrastructure requirements.
Participate in the application release cycle to ensure consistent and automated deployments.
Manage and operate Kubernetes clusters in Google Kubernetes Engine (GKE) and Amazon Elastic Kubernetes Service (EKS).
Develop and maintain Terraform modules for cloud infrastructure provisioning on GCP and AWS.
Standardize service deployments using Helm for versioned releases.
Enhance system observability using Prometheus, Grafana, and Datadog.
Design and maintain GitLab CI/CD pipelines for automated testing and deployment.
Develop automation scripts and tooling using Python, Go, or Shell.
Participate in a 24/7 on-call rotation for incident management and resolution.
Perform root cause analysis and post-incident reviews to identify and address systemic risks.

View Full Description & ApplyYou'll be redirected to the employer's site