Site Reliability Engineer
New
IndiaFull-Time
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Required Skills
- AWSDockerPythonBashGCPKubernetesAzureGoCI/CDTerraform
Requirements
- Proven experience as a Site Reliability Engineer, Platform Engineer, DevOps Engineer, or in a similar cloud infrastructure role.
- Strong scripting and programming skills using Python, Go, Bash, or comparable languages.
- Hands-on experience with Kubernetes, Docker, cloud platforms (AWS, Azure, or GCP).
- Experience with Infrastructure as Code solutions including Terraform, Pulumi, or Crossplane.
- Solid knowledge of CI/CD platforms such as GitHub Actions, Jenkins, or TeamCity.
- Experience with monitoring and observability technologies including Grafana, Prometheus, ELK, Tempo, or Loki.
- Understanding of Internal Developer Platforms (IDP), developer experience (DevEx), and platform engineering principles.
- Familiarity with cloud governance, security best practices, incident response, and ISO 27001 or similar compliance frameworks.
- Experience leveraging AI development tools such as GitHub Copilot or ChatGPT is highly desirable.
- Strong analytical, troubleshooting, communication, and collaboration skills with experience working in Agile environments.
Responsibilities
- Design, build, and maintain internal developer platforms, self-service infrastructure, and platform services using modern cloud-native technologies.
- Develop and enhance automation solutions using Python, Bash, Go, and Infrastructure as Code tools such as Terraform, Pulumi, and Crossplane.
- Collaborate with engineering teams to design reliable, scalable, and secure cloud infrastructure while supporting CI/CD pipelines and deployment strategies.
- Monitor production environments, define and improve SLIs/SLOs, implement observability solutions, and strengthen monitoring and alerting capabilities.
- Participate in incident response, troubleshoot production issues, conduct root cause analysis, and drive post-incident improvements.
- Establish and maintain cloud governance, security standards, compliance initiatives, and cost optimization strategies.
- Continuously reduce operational toil through automation and AI-assisted development practices while promoting Site Reliability Engineering principles across teams.
View Full Description & ApplyYou'll be redirected to the employer's site