Senior Site Reliability Engineer

C
CertifyOSHealthcare Data
Remote USFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
5+ years
Required Skills
PythonGCPKubernetesGoCI/CDTerraform

Requirements

  • 5+ years in SRE, DevOps, Platform Engineering, or Infrastructure Engineering.
  • Deep hands-on experience with GCP, specifically GKE and Cloud Run.
  • Experience building and maintaining Infrastructure as Code with Terraform and/or Pulumi.
  • Fluency in deployment patterns such as rolling, blue/green, and canary.
  • Strong knowledge of Linux systems administration.
  • Experience with observability platforms like Google Cloud Monitoring, Datadog, Grafana, or Prometheus.
  • Experience designing SLIs, SLOs, error budgets, and alerting strategies.
  • Proficiency in Python, Bash, or Go.
  • Experience building and maintaining CI/CD pipelines using GitHub Actions or similar.
  • Experience operating systems handling sensitive data or PII in regulated environments.

Responsibilities

  • Own the operational lifecycle end-to-end and influence platform architecture and reliability standards.
  • Manage incident response processes, root cause analysis, escalation workflows, and runbooks.
  • Maintain uptime, reduce alert fatigue, and build actionable observability across GKE and Cloud Run.
  • Improve autoscaling behavior, resource utilization, and workload efficiency.
  • Build and maintain Infrastructure as Code (IaC) and CI/CD pipelines.
  • Instrument data freshness and infrastructure health monitoring.
  • Mentor teams on reliability practices and influence operational standards.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now