Staff Site Reliability Engineer

New

Remote - U.S.Full-TimeStaff

Salary165,000 - 230,000 USD per year

Apply NowOpens the employer's application page

Job Details

8+ years of experience in Site Reliability, Platform, or DevOps engineering.
Experience operating at a Staff, Principal, or Lead level.
Deep software engineering proficiency in at least one modern language (e.g., Go, Python).
Expert architectural understanding of Kubernetes in multi-tenant/multi-cluster environments.
Expert-level knowledge of Jsonnet and Grafana Tanka.
Extensive experience with CI/CD pipelines and GitOps (GitHub Actions, ArgoCD).
Experience with infrastructure-as-code principles at enterprise scale.
Systems-level thinking for diverse deployment models (on-premises, VMware, air-gapped).
Deep expertise with observability platforms, specifically the Grafana stack.
Background in infrastructure security (container hardening, network security, vulnerability management).

Design and architect the overarching infrastructure strategy for hosted and on-premises environments.
Lead the evolution of CI/CD and Kubernetes platforms using Jsonnet and Grafana Tanka.
Define, measure, and govern Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets.
Architect enterprise observability strategy using the Grafana stack.
Drive infrastructure security and compliance architecture at an organizational level.
Establish self-service tooling and paved roads for developers to reduce operational toil.
Act as an Incident Commander for high-severity outages and conduct blameless post-mortems.
Mentor senior and mid-level engineers and drive engineering excellence.

View Full Description & ApplyYou'll be redirected to the employer's site