Lead Site Reliability Engineer

New

IndiaFull-TimeLead

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Experience: 7+ years
Required Skills: AWSDockerPythonGCPKubernetesGrafanaPrometheusCI/CDTerraform

7+ years in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles.
Hands-on experience with GCP and AWS.
Proficiency with Infrastructure as Code tools like Terraform or Helm.
Deep experience with Docker and Kubernetes (GKE).
Experience with observability tools like Prometheus, Grafana, ELK, or OpenTelemetry.
Proficiency in Python, Bash, or Shell scripting.
Basic understanding of API parsing and JSON manipulation.
Hands-on experience with CI/CD tools like Jenkins, GitHub Actions, or ArgoCD.
Experience with on-call rotations, SLOs, SLIs, SLAs, and incident management.
Experience in monitoring Mongo, Redis, ES, and queue-based systems.

Develop and improve observability using monitoring, logging, tracing, and alerting tools.
Optimize system performance, troubleshoot incidents, and conduct post-mortems/RCA to prevent future issues.
Collaborate with developers to enhance application reliability, scalability, and performance.
Drive cost optimization efforts in cloud environments.
Monitor databases including Mongo, Redis, and Queue-based systems.

View Full Description & ApplyYou'll be redirected to the employer's site