Senior Site Reliability Engineer (Remote First)

Posted about 2 months agoViewed

CanadaFull-TimeInsurTech

Company:

Location:Canada

Languages:English

Seniority level:Senior, 5+ years

Experience:5+ years

Skills:

AWSSoftware DevelopmentKubernetesGrafanaPrometheusRelease ManagementCI/CDDevOpsTerraform

Requirements:

University degree or college diploma in a technical program or equivalent work experience. 5+ years of experience as a Site Reliability Engineer. Proven experience with Terraform. Experience with Kubernetes. Experience with AWS. Demonstrated experience maintaining and improving an Incident Management process. Experience with a major observability platform (e.g., Prometheus, Grafana, Datadog, ELK Stack, Splunk, or New Relic). Experience with distributed systems. Experience with GitHub Actions for CI/CD. Experience in Backup and Recovery Scenarios. Ability to communicate efficiently and work collaboratively.

Responsibilities:

Write code and tools to automate operational tasks. Participate in on-call rotations. Implement observability and configure alerts. Define and track SLIs and SLOs. Partner with development teams on service design for reliability. Develop and test Disaster Recovery procedures. Coach and mentor junior professionals. Assist Engineering Leadership Team.