Senior Manager, Site Reliability Engineering

New

Remote-first culture, US, HK, NZFull-TimeManager

Salary$187,000 — $243,000 USD

Apply NowOpens the employer's application page

Job Details

Experience: 6+ years managing an SRE team and 10+ years of hands-on SRE or infrastructure engineering experience.
Required Skills: PostgreSQLPythonGCPKubernetesGoGrafanaPrometheusTerraformHelm

6+ years managing an SRE team.
10+ years of hands-on SRE or infrastructure engineering experience.
Deeply comfortable with core stack: Kubernetes, GCP (GKE, Cloud SQL, Pub/Sub, GCS), Terraform, Helm, ArgoCD, PostgreSQL, and Prometheus/Grafana.
Strong programming skills in Python and/or Go.
Experience writing and reviewing infrastructure tooling code.
Experience with AI coding tools.
Experience with CI/CD pipelines (GitHub Actions).
Track record of building or improving developer tooling and automation.
Experience leading teams across multiple time zones.
Track record of developing engineers into strong technical contributors.

Lead and grow our SRE team of ~10 engineers, including hiring, retention, career development, and performance management across multiple time zones.
Build strategic partnerships with product engineering pillars — shifting SRE from reactive, ticket-based support to proactive co-ownership of reliability outcomes.
Scale our multi-tenant infrastructure to support new customer onboarding and growing patient populations.
Own cloud cost management and FinOps practices, building frameworks that balance cost control with reliability and performance.
Champion developer self-service and platform engineering by building self-service capabilities for product teams.
Establish SLOs/SLIs for critical services and improve alert quality.
Ensure the SRE team is leveraging AI tooling in their workflows including IaC generation, log analysis, and root cause investigation.

View Full Description & ApplyYou'll be redirected to the employer's site