Senior Site Reliability Engineer

Posted over 1 year agoViewed

North AmericaFull-TimeIncident Management Platform

Company:Rootly

Location:North America

Languages:English

Seniority level:Senior, 5+ years

Experience:5+ years

Skills:

AWSBackend DevelopmentSoftware DevelopmentCloud ComputingGitKubernetesAmazon Web ServicesCI/CD

Requirements:

You have 5+ years of experience in an SRE or Infrastructure Engineering role. 5+ years of experience writing software as a SWE or Software heavy SRE role. You have strong technical knowledge of cloud infrastructure, distributed systems, and reliability practices. You’ve supported services at web or RPC services at a significant scale. You have experience solving infrastructure problems by writing software. You have a big-picture perspective on systems and tools. You can collaborate with other Engineering teams to understand their systems and help to improve them.

Responsibilities:

Participate in an on-call rotation to support critical Rootly services, and in some cases be on call with software teams. Participate in the definition and management of SLOs and error budgets for the Engineering teams that own services in production. Build tools to support our processes. Embed with feature delivery software teams to build and enhance observability, reliability, and availability of those services. Work with other teams around Engineering to understand their systems and their challenges at the code level and identify improvements in Rootly Infrastructure to improve the services they own (contribute code where possible).