Site Reliability Engineer

Posted 3 months agoViewed
CanadaFull-TimeSupply Chain Solutions
Company:Tecsys Inc.
Location:Canada
Languages:English
Seniority level:Staff, 5+ years
Experience:5+ years
Skills:
AWSPythonAWS EKSBashCloud ComputingJavaJenkinsKubernetesCI/CDDevOpsTerraformAnsibleSaaS
Requirements:
5+ years in Site Reliability, Cloud, or DevOps Engineering Experience designing and deploying large scale systems, multi-vendor platforms and globally distributed infrastructure Proven experience managing cloud infrastructure in AWS (multi-account, VPC, EC2, EKS) and Kubernetes at scale Strong hands-on experience with IaC and automation (Terraform, Ansible, or similar) Familiarity with CI/CD pipelines and release automation (GitLab preferred, Jenkins acceptable) Deep understanding of monitoring and observability using Datadog (or equivalent) Experience with incident management, on-call participation, escalation, and structured postmortems Scripting skills in Python, Bash, Java or equivalent Basic knowledge of Java- or .Net-based development Strong English communication skills
Responsibilities:
Collaborate with Engineering teams on system design, platforms, capacity planning, and launch reviews. Innovate to simplify, scale, and strengthen the platform. Maintain services by monitoring availability, latency, and system health. Own observability by enhancing monitoring and alerting using Datadog. Drive automation for tooling, IaC, and pipelines. Scale systems sustainably and evolve them for reliability and velocity. Participate in on-call rotation. Practice sustainable incident response and blameless postmortems. Lead post-incident reviews (RCAs) and identify long-term fixes. Implement monitoring, logging, alerting, and SLA reporting. Create and maintain technical documentation. Implement and mature SRE best practices. Act as Incident Commander for incidents. Provide support for planning and deployment teams. Collaborate with Platform Engineering team on strategic efforts. Work cross-functionally with internal teams and vendors.
Similar Jobs:
Posted 1 day ago
North AmericasFull-TimeSoftware Development
Backend Engineer II - Minesweeper - Personalization
Company:
Posted 1 day ago
Ontario, CanadaFull-TimeSaaS, Risk Management
Senior Solutions Engineer | REMOTE (ONTARIO)
Company:Gatekeeper
Posted 1 day ago
CanadaFull-TimeSoftware Development
Senior Software Engineer, Backend (Growth Platform)