Lead Site Reliability Engineer

New
IndiaFull-TimeLead
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
7+ years
Required Skills
AWSDockerPythonGCPKubernetesGrafanaPrometheusCI/CDTerraform

Requirements

  • 7+ years in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles.
  • Hands-on experience with GCP and AWS.
  • Proficiency with Infrastructure as Code tools like Terraform or Helm.
  • Deep experience with Docker and Kubernetes (GKE).
  • Experience with observability tools like Prometheus, Grafana, ELK, or OpenTelemetry.
  • Proficiency in Python, Bash, or Shell scripting.
  • Basic understanding of API parsing and JSON manipulation.
  • Hands-on experience with CI/CD tools like Jenkins, GitHub Actions, or ArgoCD.
  • Experience with on-call rotations, SLOs, SLIs, SLAs, and incident management.
  • Experience in monitoring Mongo, Redis, ES, and queue-based systems.

Responsibilities

  • Develop and improve observability using monitoring, logging, tracing, and alerting tools.
  • Optimize system performance, troubleshoot incidents, and conduct post-mortems/RCA to prevent future issues.
  • Collaborate with developers to enhance application reliability, scalability, and performance.
  • Drive cost optimization efforts in cloud environments.
  • Monitor databases including Mongo, Redis, and Queue-based systems.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now