Site Reliability Engineer

New

IndiaFull-Time

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Proven experience as a Site Reliability Engineer, Platform Engineer, DevOps Engineer, or in a similar cloud infrastructure role.
Strong scripting and programming skills using Python, Go, Bash, or comparable languages.
Hands-on experience with Kubernetes, Docker, cloud platforms (AWS, Azure, or GCP).
Experience with Infrastructure as Code solutions including Terraform, Pulumi, or Crossplane.
Solid knowledge of CI/CD platforms such as GitHub Actions, Jenkins, or TeamCity.
Experience with monitoring and observability technologies including Grafana, Prometheus, ELK, Tempo, or Loki.
Understanding of Internal Developer Platforms (IDP), developer experience (DevEx), and platform engineering principles.
Familiarity with cloud governance, security best practices, incident response, and ISO 27001 or similar compliance frameworks.
Experience leveraging AI development tools such as GitHub Copilot or ChatGPT is highly desirable.
Strong analytical, troubleshooting, communication, and collaboration skills with experience working in Agile environments.

Design, build, and maintain internal developer platforms, self-service infrastructure, and platform services using modern cloud-native technologies.
Develop and enhance automation solutions using Python, Bash, Go, and Infrastructure as Code tools such as Terraform, Pulumi, and Crossplane.
Collaborate with engineering teams to design reliable, scalable, and secure cloud infrastructure while supporting CI/CD pipelines and deployment strategies.
Monitor production environments, define and improve SLIs/SLOs, implement observability solutions, and strengthen monitoring and alerting capabilities.
Participate in incident response, troubleshoot production issues, conduct root cause analysis, and drive post-incident improvements.
Establish and maintain cloud governance, security standards, compliance initiatives, and cost optimization strategies.
Continuously reduce operational toil through automation and AI-assisted development practices while promoting Site Reliability Engineering principles across teams.

View Full Description & ApplyYou'll be redirected to the employer's site