Senior Software Engineer, Site Reliability

Posted about 1 month agoViewed

186818 - 232000 USD per year

United States, CanadaFull-TimeE-commerce, Registry

Company:Babylist

Location:United States, Canada

Languages:English

Seniority level:Senior, 8+ years

Experience:8+ years

Skills:

AWSDockerAWS EKSJenkinsKubernetesMySQLRedisCI/CDDevOpsTerraformSoftware EngineeringTroubleshooting

Requirements:

8+ years of experience as a Site Reliability Engineer or similar role. Experience supporting high-traffic consumer-facing websites. Proficiency with Terraform is a must. Strong experience working with AWS cloud-based infrastructure and services. Proficiency with Docker and Kubernetes is essential. Solid understanding of cloud-native systems design, including CDNs, load balancers, cloud networking, DNS, caching, and distributed systems. Troubleshooting and debugging skills. Experience designing and supporting CI systems (e.g., CircleCI, Jenkins, GitHub Actions). Familiarity with monitoring and alerting best practices (e.g., Datadog, Cronitor, Sentry, PagerDuty). Proven experience in on-call management best practices. Excellent verbal and written communication skills. Comfortable working in an AI-forward environment.

Responsibilities:

Manage and build AWS infrastructure using Infrastructure as Code (IaC) tools like Terraform. Ensure EKS clusters and databases are running up-to-date versions, optimizing performance and reliability. Improve the speed and reliability of Continuous Integration (CI) systems. Provide support to developers in troubleshooting issues across environments. Establish, communicate, and support best practices for monitoring and alerting.