Site Reliability Engineer II - Platform Engineering

India (Remote)Full-TimeMiddle

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Experience: 5-8 years
Required Skills: PythonAgileKubernetesSCRUMGoGrafanaPrometheusLinuxTerraformDatadogHelm

5-8 years of experience as a Site Reliability Engineer, Platform Engineer, or DevOps Engineer
Hands-on experience managing Kubernetes clusters (GKE, EKS) in GCP and AWS
Strong knowledge of Terraform, Helm, and GitLab CI/CD pipelines
Proficiency in Python, Go, or Shell scripting for automation and tooling
Experience implementing and managing observability stacks (Prometheus, Grafana, Datadog)
Deep understanding of Linux systems, cloud networking, and container orchestration concepts
Experience working in Agile/Scrum environments
Excellent analytical skills with a proactive attitude

Collaborate closely with Developers, QA, and Product teams during sprint planning to understand release plans, dependencies, and infrastructure requirements
Participate in the application release cycle, ensuring deployments are automated, consistent, and reliable
Manage and operate Kubernetes clusters in Google Kubernetes Engine (GKE) and Amazon Elastic Kubernetes Service (EKS)
Develop and manage Terraform modules for provisioning and configuring cloud infrastructure across GCP and AWS
Standardize service deployments using Helm for templating and versioned releases
Build and enhance observability with Prometheus, Grafana, and Datadog to monitor application and platform performance
Design, implement, and maintain GitLab CI/CD pipelines for build, test, and deployment automation
Drive an automation-first culture by developing scripts and tooling in Python, Go, or Shell to minimize manual effort and improve efficiency
Participate in a 24/7 on-call rotation, ensuring quick detection, mitigation, and resolution of incidents
Perform root cause analysis (RCA) and contribute to post-incident reviews to prevent recurrence
Proactively identify reliability or scalability gaps, raise early warnings, and partner with teams to address systemic risks

View Full Description & ApplyYou'll be redirected to the employer's site