Staff Site Reliability Engineer

New
S
Remote - U.S.Full-TimeStaff
Salary165,000 - 230,000 USD per year
Apply NowOpens the employer's application page

Job Details

Experience
8+ years
Required Skills
PythonKubernetesGoGitHub Actions

Requirements

  • 8+ years of experience in Site Reliability, Platform, or DevOps engineering.
  • Experience operating at a Staff, Principal, or Lead level.
  • Deep software engineering proficiency in at least one modern language (e.g., Go, Python).
  • Expert architectural understanding of Kubernetes in multi-tenant/multi-cluster environments.
  • Expert-level knowledge of Jsonnet and Grafana Tanka.
  • Extensive experience with CI/CD pipelines and GitOps (GitHub Actions, ArgoCD).
  • Experience with infrastructure-as-code principles at enterprise scale.
  • Systems-level thinking for diverse deployment models (on-premises, VMware, air-gapped).
  • Deep expertise with observability platforms, specifically the Grafana stack.
  • Background in infrastructure security (container hardening, network security, vulnerability management).

Responsibilities

  • Design and architect the overarching infrastructure strategy for hosted and on-premises environments.
  • Lead the evolution of CI/CD and Kubernetes platforms using Jsonnet and Grafana Tanka.
  • Define, measure, and govern Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets.
  • Architect enterprise observability strategy using the Grafana stack.
  • Drive infrastructure security and compliance architecture at an organizational level.
  • Establish self-service tooling and paved roads for developers to reduce operational toil.
  • Act as an Incident Commander for high-severity outages and conduct blameless post-mortems.
  • Mentor senior and mid-level engineers and drive engineering excellence.
View Full Description & ApplyYou'll be redirected to the employer's site
165,000 - 230,000 USD per year
Apply Now