Senior Site Reliability Engineer

New
Canada and the US Pacific NorthwestFull-TimeSenior
Salary145,000 - 185,000 CAD per year
Apply NowOpens the employer's application page

Job Details

Experience
5+ years
Required Skills
AWSDockerPythonBashKubernetesGrafanaPrometheusLinuxTerraformGitHub ActionsHelm

Requirements

  • 5+ years in SRE, DevOps, or infrastructure engineering roles
  • Track record of operating production systems across multiple regions
  • Terraform: Modules, state management, and multi-environment patterns
  • AWS depth: VPC, IAM, EKS, S3, and CloudWatch
  • Kubernetes expertise: Cluster operations, autoscaling, RBAC, and Helm
  • CI/CD and GitOps: GitHub Actions, ArgoCD, or similar workflows
  • Networking fundamentals: CIDR, DNS, load balancing, VPN, and cross-region connectivity
  • Observability: Prometheus and Grafana
  • Scripting: Python and Bash for tooling and automation
  • Cross-platform familiarity: Working knowledge of both Linux and Windows environments
  • Operational experience supporting Windows-based workloads
  • Comfortable in a fast-moving startup with evolving priorities
  • Take ownership of systems while collaborating closely with other teams
  • Pragmatic about tradeoffs between speed, reliability, and complexity

Responsibilities

  • Design, build, and maintain multi-region AWS infrastructure using Terraform
  • Operate and scale EKS clusters across production regions: autoscaling, node lifecycle, workload health
  • Manage networking across environments: VPC design, DNS, load balancing, and cross-region connectivity
  • Support infrastructure changes, migrations, and expansions into new regions
  • Contribute to and improve GitOps-based deployment workflows using GitHub Actions, Helm, and Kustomize
  • Help build and run incident management processes: severity definitions, escalation paths, on-call practices
  • Lead incident response, debugging, and root-cause analysis
  • Write postmortems and drive systemic reliability improvements
  • Improve observability across metrics, logging, tracing, and dashboards
  • Support GPU and batch workloads running on Kubernetes
  • Provide security-conscious feedback on platform architecture decisions
  • Own cloud IAM governance: roles, policies, and access boundaries across accounts and services
  • Lead compliance-adjacent work including audit-readiness, partner certification requirements, and supporting responses to customer security questionnaires
  • Improve CI/CD pipelines and infrastructure validation
  • Support engineers with infrastructure debugging, environment setup, and performance issues
  • Contribute to tooling and automation in Python and Bash
View Full Description & ApplyYou'll be redirected to the employer's site
145,000 - 185,000 CAD per year
Apply Now