Senior Site Reliability Engineer

New

Canada and the US Pacific NorthwestFull-TimeSenior

Salary145,000 - 185,000 CAD per year

Apply NowOpens the employer's application page

Job Details

Experience: 5+ years
Required Skills: AWSDockerPythonBashKubernetesGrafanaPrometheusLinuxTerraformGitHub ActionsHelm

5+ years in SRE, DevOps, or infrastructure engineering roles
Track record of operating production systems across multiple regions
Terraform: Modules, state management, and multi-environment patterns
AWS depth: VPC, IAM, EKS, S3, and CloudWatch
Kubernetes expertise: Cluster operations, autoscaling, RBAC, and Helm
CI/CD and GitOps: GitHub Actions, ArgoCD, or similar workflows
Networking fundamentals: CIDR, DNS, load balancing, VPN, and cross-region connectivity
Observability: Prometheus and Grafana
Scripting: Python and Bash for tooling and automation
Cross-platform familiarity: Working knowledge of both Linux and Windows environments
Operational experience supporting Windows-based workloads
Comfortable in a fast-moving startup with evolving priorities
Take ownership of systems while collaborating closely with other teams
Pragmatic about tradeoffs between speed, reliability, and complexity

Design, build, and maintain multi-region AWS infrastructure using Terraform
Operate and scale EKS clusters across production regions: autoscaling, node lifecycle, workload health
Manage networking across environments: VPC design, DNS, load balancing, and cross-region connectivity
Support infrastructure changes, migrations, and expansions into new regions
Contribute to and improve GitOps-based deployment workflows using GitHub Actions, Helm, and Kustomize
Help build and run incident management processes: severity definitions, escalation paths, on-call practices
Lead incident response, debugging, and root-cause analysis
Write postmortems and drive systemic reliability improvements
Improve observability across metrics, logging, tracing, and dashboards
Support GPU and batch workloads running on Kubernetes
Provide security-conscious feedback on platform architecture decisions
Own cloud IAM governance: roles, policies, and access boundaries across accounts and services
Lead compliance-adjacent work including audit-readiness, partner certification requirements, and supporting responses to customer security questionnaires
Improve CI/CD pipelines and infrastructure validation
Support engineers with infrastructure debugging, environment setup, and performance issues
Contribute to tooling and automation in Python and Bash

View Full Description & ApplyYou'll be redirected to the employer's site