Staff Site Reliability Engineer
New
S
SimSpace CorporationCybersecurity
Remote - U.S.Full-TimeStaff
Salary165,000 - 230,000 USD per year
Apply NowOpens the employer's application page
Job Details
- Experience
- 8+ years
- Required Skills
- PythonKubernetesGoGitHub Actions
Requirements
- 8+ years of experience in Site Reliability, Platform, or DevOps engineering.
- Experience operating at a Staff, Principal, or Lead level.
- Deep software engineering proficiency in at least one modern language (e.g., Go, Python).
- Expert architectural understanding of Kubernetes in multi-tenant/multi-cluster environments.
- Expert-level knowledge of Jsonnet and Grafana Tanka.
- Extensive experience with CI/CD pipelines and GitOps (GitHub Actions, ArgoCD).
- Experience with infrastructure-as-code principles at enterprise scale.
- Systems-level thinking for diverse deployment models (on-premises, VMware, air-gapped).
- Deep expertise with observability platforms, specifically the Grafana stack.
- Background in infrastructure security (container hardening, network security, vulnerability management).
Responsibilities
- Design and architect the overarching infrastructure strategy for hosted and on-premises environments.
- Lead the evolution of CI/CD and Kubernetes platforms using Jsonnet and Grafana Tanka.
- Define, measure, and govern Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets.
- Architect enterprise observability strategy using the Grafana stack.
- Drive infrastructure security and compliance architecture at an organizational level.
- Establish self-service tooling and paved roads for developers to reduce operational toil.
- Act as an Incident Commander for high-severity outages and conduct blameless post-mortems.
- Mentor senior and mid-level engineers and drive engineering excellence.
View Full Description & ApplyYou'll be redirected to the employer's site