Lead Site Reliability Engineer
New
CanadaFull-TimeLead
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 5+ years
- Required Skills
- AWSPythonGCPJavaTypeScriptAzureGoCI/CD
Requirements
- 5+ years of experience in Site Reliability Engineering, Platform Engineering, or Infrastructure Engineering in large-scale distributed systems.
- Strong programming ability in at least one language (e.g., Python, Go, Java, or TypeScript).
- Proven experience improving system reliability through cross-team adoption of observability, resilience patterns, or deployment safety mechanisms.
- Strong systems thinking with the ability to identify root causes and apply high-impact solutions.
- Hands-on experience with cloud environments (AWS, GCP, or Azure).
- Experience with CI/CD pipelines and infrastructure-as-code.
- Proficiency with modern observability platforms.
- Experience defining and operating with SLOs/SLIs.
- Ability to influence and drive change across teams without direct managerial authority.
- Clear and effective communication skills.
Responsibilities
- Identify and analyze recurring failure patterns across production systems by reviewing incidents, postmortems, and operational data.
- Prioritize reliability improvements based on impact to MTTR, MTTD, and overall customer-facing stability.
- Drive the design and adoption of reliability patterns such as resilience mechanisms, safe deployment strategies, observability standards, and dependency protection.
- Collaborate directly with product and platform teams through code contributions, reviews, and hands-on engineering support.
- Lead technical discussions in incident reviews and operational forums to identify gaps in monitoring, recovery, and system design.
- Influence engineering teams across the organization to adopt reliability best practices and shared standards.
- Evangelize and document reliability improvements to ensure knowledge scaling.
- Participate in on-call rotations and incident leadership.
- Contribute to the evolution of internal SRE practices by mentoring engineers.
View Full Description & ApplyYou'll be redirected to the employer's site