Senior Manager, Site Reliability Engineering
New
Remote-first culture, US, HK, NZFull-TimeManager
Salary$187,000 — $243,000 USD
Apply NowOpens the employer's application page
Job Details
- Experience
- 6+ years managing an SRE team and 10+ years of hands-on SRE or infrastructure engineering experience.
- Required Skills
- PostgreSQLPythonGCPKubernetesGoGrafanaPrometheusTerraformHelm
Requirements
- 6+ years managing an SRE team.
- 10+ years of hands-on SRE or infrastructure engineering experience.
- Deeply comfortable with core stack: Kubernetes, GCP (GKE, Cloud SQL, Pub/Sub, GCS), Terraform, Helm, ArgoCD, PostgreSQL, and Prometheus/Grafana.
- Strong programming skills in Python and/or Go.
- Experience writing and reviewing infrastructure tooling code.
- Experience with AI coding tools.
- Experience with CI/CD pipelines (GitHub Actions).
- Track record of building or improving developer tooling and automation.
- Experience leading teams across multiple time zones.
- Track record of developing engineers into strong technical contributors.
Responsibilities
- Lead and grow our SRE team of ~10 engineers, including hiring, retention, career development, and performance management across multiple time zones.
- Build strategic partnerships with product engineering pillars — shifting SRE from reactive, ticket-based support to proactive co-ownership of reliability outcomes.
- Scale our multi-tenant infrastructure to support new customer onboarding and growing patient populations.
- Own cloud cost management and FinOps practices, building frameworks that balance cost control with reliability and performance.
- Champion developer self-service and platform engineering by building self-service capabilities for product teams.
- Establish SLOs/SLIs for critical services and improve alert quality.
- Ensure the SRE team is leveraging AI tooling in their workflows including IaC generation, log analysis, and root cause investigation.
View Full Description & ApplyYou'll be redirected to the employer's site