Senior Site Reliability Engineer
C
CertifyOSHealthcare Data
Remote USFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 5+ years
- Required Skills
- PythonGCPKubernetesGoCI/CDTerraform
Requirements
- 5+ years in SRE, DevOps, Platform Engineering, or Infrastructure Engineering.
- Deep hands-on experience with GCP, specifically GKE and Cloud Run.
- Experience building and maintaining Infrastructure as Code with Terraform and/or Pulumi.
- Fluency in deployment patterns such as rolling, blue/green, and canary.
- Strong knowledge of Linux systems administration.
- Experience with observability platforms like Google Cloud Monitoring, Datadog, Grafana, or Prometheus.
- Experience designing SLIs, SLOs, error budgets, and alerting strategies.
- Proficiency in Python, Bash, or Go.
- Experience building and maintaining CI/CD pipelines using GitHub Actions or similar.
- Experience operating systems handling sensitive data or PII in regulated environments.
Responsibilities
- Own the operational lifecycle end-to-end and influence platform architecture and reliability standards.
- Manage incident response processes, root cause analysis, escalation workflows, and runbooks.
- Maintain uptime, reduce alert fatigue, and build actionable observability across GKE and Cloud Run.
- Improve autoscaling behavior, resource utilization, and workload efficiency.
- Build and maintain Infrastructure as Code (IaC) and CI/CD pipelines.
- Instrument data freshness and infrastructure health monitoring.
- Mentor teams on reliability practices and influence operational standards.
View Full Description & ApplyYou'll be redirected to the employer's site