Site Reliability Engineer

Posted about 2 hours agoViewed

29000 - 36000 USD per year

IndiaFull-TimeE-commerce

Company:SupplyHouse.com

Location:India, EST

Languages:English

Seniority level:Middle, 3+ years

Experience:3+ years

Skills:

DockerPythonBashCloud ComputingGCPJenkinsKubernetesGoGrafanaPrometheusCI/CDLinuxDevOpsTerraformAnsibleSoftware Engineering

Requirements:

Bachelors degree in Computer Science, Engineering, or a related field 3+ years of hands-on experience as a Site Reliability Engineer, DevOps Engineer, Systems Engineer, or Cloud Infrastructure Engineer Proven track record managing production-grade systems on Google Cloud Platform (GCP) or other cloud providers Strong understanding of Linux/Unix system administration, networking, and troubleshooting Experience implementing Infrastructure as Code (IaC) using tools like Terraform, Ansible, or Deployment Manager Familiarity with containerization and orchestration technologies such as Docker and Kubernetes (GKE) Experience with monitoring and observability tools (Google Cloud Operations Suite, Prometheus, Grafana, Datadog, ELK) Experience defining and monitoring SLAs, SLOs, and SLIs Proven ability to handle incident response, conduct postmortems, and drive root cause analysis Proficiency in at least one scripting language (Python, Bash, or Go) for automation and tooling Hands-on experience building or managing CI/CD pipelines (Jenkins, GitLab CI, Cloud Build) Strong background in configuration management and release automation Knowledge of IAM (Identity and Access Management), network security, and cloud compliance controls Familiarity with disaster recovery (DR), backups, and high-availability design

Responsibilities:

Design, build, and maintain scalable, reliable systems on GCP Develop automation for infrastructure provisioning Build and maintain observability platforms Manage incident response and conduct postmortems Partner with DevOps and engineering teams to enhance CI/CD pipelines Define and monitor SLAs, SLOs, and SLIs Implement disaster recovery (DR) and backup strategies Continuously optimize performance, capacity, and cost-efficiency of GCP resources