Senior Site Reliability Engineer

New
Remote - Canada / Remote - Pacific Northwest AreaFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
5+ years
Required Skills
AWSPythonBashKubernetesGrafanaPrometheusCI/CDTerraform

Requirements

  • 5+ years in SRE, DevOps, or infrastructure engineering roles.
  • Demonstrated track record of operating production systems across multiple regions.
  • Deep expertise in AWS services: VPC, IAM, EKS, S3, and CloudWatch.
  • Advanced Kubernetes skills: cluster operations, autoscaling, RBAC, and Helm.
  • Proficiency with Terraform modules, state management, and multi-environment patterns.
  • Experience with CI/CD and GitOps workflows (e.g., GitHub Actions, ArgoCD).
  • Strong understanding of networking fundamentals: CIDR, DNS, load balancing, and VPN.
  • Familiarity with observability tools such as Prometheus and Grafana.
  • Comfort with Python and Bash for tooling and automation.
  • Working knowledge of both Linux and Windows environments.

Responsibilities

  • Design, build, and maintain multi-region AWS infrastructure using Terraform.
  • Operate and scale EKS clusters, including autoscaling, node lifecycle, and workload health management.
  • Manage network infrastructure including VPC design, DNS, load balancing, and cross-region connectivity.
  • Improve GitOps-based deployment workflows using GitHub Actions, Helm, and Kustomize.
  • Lead incident management processes, debugging, root-cause analysis, and postmortems.
  • Enhance observability using metrics, logging, tracing, and dashboards.
  • Govern cloud IAM policies and roles across services.
  • Develop internal platform tooling and automation using Python and Bash.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now