Staff Site Reliability Engineer

Posted 4 months agoViewed
201000 - 287100 USD per year
United StatesFull-TimeData Resilience / SaaS
Company:Veeam Software
Location:United States
Languages:English
Seniority level:Staff, 8+ years
Experience:8+ years
Skills:
LeadershipJavaJavascriptKubernetesMicrosoft AzureTypeScriptC#GoGrafanaPrometheusCI/CDDevOpsTerraformSoftware Engineering
Requirements:
8+ years of experience in a Software Engineering or SRE role, including technical leadership. Demonstrated experience mentoring and guiding senior engineers. Deep expertise in building distributed systems on public cloud (Azure preferred). Strong skills in programming (e.g., JS, Go, Typescript, Java, or C#). Hands-on experience with observability tooling (e.g., Prometheus, Grafana, OpenTelemetry). Mastery of infrastructure automation tools (Terraform, Pulumi) and container orchestration (Kubernetes). Ability to communicate clearly across geographies and disciplines. Experience leading SRE initiatives across multiple product teams (preferred). Background in chaos engineering, incident learning, or performance and load testing (preferred). Familiarity with global compliance standards (ISO, SOC 2, GDPR, FedRAMP, CMMC) (preferred).
Responsibilities:
Serve as a hands-on technical leader within the SRE team. Guide senior engineers, influence product development teams. Ensure systems are reliable, scalable, and observable. Drive strategic initiatives and mentor others in SRE practices. Help define architectural best practices. Align teams, enforce high standards, and scale SRE principles globally. Act as a technical authority, mentoring senior engineers and guiding design choices. Lead the definition and enforcement of SLIs, SLOs, and error budgets. Collaborate with Staff peers to align strategy and champion reliability standards. Partner with development and product teams to design for failure and build resilient architecture. Drive adoption of observability best practices and tooling. Ensure metrics, logs, and traces provide actionable insights. Lead complex incident responses and systemic reliability improvements. Promote a blameless culture of learning. Lead initiatives in infrastructure as code, deployment automation, and resilience testing. Influence the development and adoption of chaos engineering practices. Partner with platform and security teams to ensure production readiness. Provide architectural guidance and advocate for engineering rigor. Represent the SRE team in technical leadership forums.
About the Company
Veeam Software
5001-10000 employeesVirtualization
View Company Profile
Similar Jobs:
Posted 4 days ago
U.S.Full-TimeSoftware Development
Staff Site Reliability Engineer
Posted 18 days ago
United StatesFull-TimeSoftware Development
Staff Site Reliability Engineer
Company:Gradle Inc.
Posted about 1 month ago
United StatesFull-TimeFintech
Staff Site Reliability Engineer
Company:Stash