Senior Manager, Site Reliability Engineering
New
C
Counterpart HealthHealth Technology
Remote - USA, US, HK, NZFull-TimeManager
Salary$187,000 — $243,000 USD
Apply NowOpens the employer's application page
Job Details
- Experience
- 6+ years managing an SRE team and 10+ years of hands-on SRE or infrastructure engineering experience.
- Required Skills
- PostgreSQLPythonGCPKubernetesGoGrafanaPrometheusTerraformHelm
Requirements
- 10+ years of hands-on SRE or infrastructure engineering experience.
- 6+ years of experience managing an SRE team.
- Deep expertise in Kubernetes, GCP (GKE, Cloud SQL, Pub/Sub, GCS), Terraform, Helm, and ArgoCD.
- Proficiency with PostgreSQL and monitoring tools like Prometheus and Grafana.
- Strong programming skills in Python and/or Go.
- Experience with CI/CD pipelines, specifically GitHub Actions.
- Track record of building developer tooling and automation.
- Experience leading teams across multiple time zones.
- Proven ability to develop engineers into strong technical contributors.
- Sound build-vs-buy judgment for infrastructure solutions.
Responsibilities
- Lead and grow an SRE team of ~10 engineers, including hiring, retention, and performance management.
- Build strategic partnerships with product engineering teams to transition from reactive support to proactive reliability ownership.
- Scale multi-tenant infrastructure to support new customer onboarding and growing patient populations.
- Own cloud cost management and FinOps practices to balance cost, reliability, and performance.
- Champion developer self-service and platform engineering to reduce ticket volume.
- Establish SLOs/SLIs for critical services and improve alert quality.
- Integrate AI tooling like Claude Code into SRE workflows for automation and root cause investigation.
View Full Description & ApplyYou'll be redirected to the employer's site