Site Reliability Engineer II
New
Location: India ( Remote )Full-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 5-8 years of experience as a Site Reliability Engineer, Platform Engineer, or DevOps Engineer.
- Required Skills
- PythonKubernetesGoGrafanaPrometheusTerraformHelm
Requirements
- 5-8 years of experience as a Site Reliability Engineer, Platform Engineer, or DevOps Engineer.
- Hands-on experience managing Kubernetes clusters (GKE, EKS) in GCP and AWS.
- Strong knowledge of Terraform, Helm, and GitLab CI/CD pipelines.
- Proficiency in Python, Go, or Shell scripting for automation and tooling.
- Experience implementing and managing observability stacks including Prometheus, Grafana, and Datadog.
- Deep understanding of Linux systems and cloud networking concepts.
- Experience with container orchestration.
- Experience working in Agile/Scrum environments.
- Excellent analytical and proactive problem-solving skills.
Responsibilities
- Collaborate with Developers, QA, and Product teams on release planning and infrastructure requirements.
- Participate in the application release cycle to ensure consistent and automated deployments.
- Manage and operate Kubernetes clusters in Google Kubernetes Engine (GKE) and Amazon Elastic Kubernetes Service (EKS).
- Develop and maintain Terraform modules for cloud infrastructure provisioning on GCP and AWS.
- Standardize service deployments using Helm for versioned releases.
- Enhance system observability using Prometheus, Grafana, and Datadog.
- Design and maintain GitLab CI/CD pipelines for automated testing and deployment.
- Develop automation scripts and tooling using Python, Go, or Shell.
- Participate in a 24/7 on-call rotation for incident management and resolution.
- Perform root cause analysis and post-incident reviews to identify and address systemic risks.
View Full Description & ApplyYou'll be redirected to the employer's site