Senior Site Reliability Engineer

New

CanadaFull-TimeSenior

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

5+ years of industry experience with growing depth in cloud infrastructure and SRE practices
Experience managing production Kubernetes environments at scale
Proven experience responding to production incidents in high-stakes environments
Proficiency writing and maintaining Terraform at the module level
Experience with GitOps workflows including Helm and ArgoCD
Ability to balance reactive operational work with proactive roadmap delivery
Experience building observability dashboards and managing alert systems
Experience with security hardening in regulated environments such as FedRAMP or SOC 2

Own and operate production Kubernetes clusters (Amazon EKS) including upgrades, scaling, security hardening, and cluster lifecycle management
Design, implement, and maintain infrastructure-as-code using Terraform
Manage and evolve Helm chart definitions and ArgoCD GitOps workflows for multi-region SaaS deployments
Operate and maintain observability infrastructure including Grafana, alerts, dashboards, and log pipelines
Contribute to pipeline reliability to improve developer experience
Remediate security vulnerabilities (CVEs) in container images and infrastructure components
Ensure alignment with internal policies and frameworks such as ISO 27001, SOC2, and NIST
Participate in on-call incident response rotation and conduct post-incident reviews

View Full Description & ApplyYou'll be redirected to the employer's site