Lead Site Reliability Engineer
New
IndiaFull-TimeLead
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 7+ years
- Required Skills
- AWSDockerPythonGCPKubernetesGrafanaPrometheusCI/CDTerraform
Requirements
- 7+ years in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles.
- Hands-on experience with GCP and AWS.
- Proficiency with Infrastructure as Code tools like Terraform or Helm.
- Deep experience with Docker and Kubernetes (GKE).
- Experience with observability tools like Prometheus, Grafana, ELK, or OpenTelemetry.
- Proficiency in Python, Bash, or Shell scripting.
- Basic understanding of API parsing and JSON manipulation.
- Hands-on experience with CI/CD tools like Jenkins, GitHub Actions, or ArgoCD.
- Experience with on-call rotations, SLOs, SLIs, SLAs, and incident management.
- Experience in monitoring Mongo, Redis, ES, and queue-based systems.
Responsibilities
- Develop and improve observability using monitoring, logging, tracing, and alerting tools.
- Optimize system performance, troubleshoot incidents, and conduct post-mortems/RCA to prevent future issues.
- Collaborate with developers to enhance application reliability, scalability, and performance.
- Drive cost optimization efforts in cloud environments.
- Monitor databases including Mongo, Redis, and Queue-based systems.
View Full Description & ApplyYou'll be redirected to the employer's site