Senior Site Reliability Engineer
New
CanadaFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 5+ years of industry experience
- Required Skills
- AWSKubernetesGrafanaCI/CDTerraformHelm
Requirements
- 5+ years of industry experience with growing depth in cloud infrastructure and SRE practices
- Experience managing production Kubernetes environments at scale
- Proven experience responding to production incidents in high-stakes environments
- Proficiency writing and maintaining Terraform at the module level
- Experience with GitOps workflows including Helm and ArgoCD
- Ability to balance reactive operational work with proactive roadmap delivery
- Experience building observability dashboards and managing alert systems
- Experience with security hardening in regulated environments such as FedRAMP or SOC 2
Responsibilities
- Own and operate production Kubernetes clusters (Amazon EKS) including upgrades, scaling, security hardening, and cluster lifecycle management
- Design, implement, and maintain infrastructure-as-code using Terraform
- Manage and evolve Helm chart definitions and ArgoCD GitOps workflows for multi-region SaaS deployments
- Operate and maintain observability infrastructure including Grafana, alerts, dashboards, and log pipelines
- Contribute to pipeline reliability to improve developer experience
- Remediate security vulnerabilities (CVEs) in container images and infrastructure components
- Ensure alignment with internal policies and frameworks such as ISO 27001, SOC2, and NIST
- Participate in on-call incident response rotation and conduct post-incident reviews
View Full Description & ApplyYou'll be redirected to the employer's site