Senior Site Reliability Engineer

New
CanadaFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
5+ years of industry experience
Required Skills
AWSKubernetesGrafanaCI/CDTerraformHelm

Requirements

  • 5+ years of industry experience with growing depth in cloud infrastructure and SRE practices
  • Experience managing production Kubernetes environments at scale
  • Proven experience responding to production incidents in high-stakes environments
  • Proficiency writing and maintaining Terraform at the module level
  • Experience with GitOps workflows including Helm and ArgoCD
  • Ability to balance reactive operational work with proactive roadmap delivery
  • Experience building observability dashboards and managing alert systems
  • Experience with security hardening in regulated environments such as FedRAMP or SOC 2

Responsibilities

  • Own and operate production Kubernetes clusters (Amazon EKS) including upgrades, scaling, security hardening, and cluster lifecycle management
  • Design, implement, and maintain infrastructure-as-code using Terraform
  • Manage and evolve Helm chart definitions and ArgoCD GitOps workflows for multi-region SaaS deployments
  • Operate and maintain observability infrastructure including Grafana, alerts, dashboards, and log pipelines
  • Contribute to pipeline reliability to improve developer experience
  • Remediate security vulnerabilities (CVEs) in container images and infrastructure components
  • Ensure alignment with internal policies and frameworks such as ISO 27001, SOC2, and NIST
  • Participate in on-call incident response rotation and conduct post-incident reviews
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now