Lead Site Reliability Engineer

New
CanadaFull-TimeLead
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
5+ years
Required Skills
AWSPythonGCPJavaTypeScriptAzureGoCI/CD

Requirements

  • 5+ years of experience in Site Reliability Engineering, Platform Engineering, or Infrastructure Engineering in large-scale distributed systems.
  • Strong programming ability in at least one language (e.g., Python, Go, Java, or TypeScript).
  • Proven experience improving system reliability through cross-team adoption of observability, resilience patterns, or deployment safety mechanisms.
  • Strong systems thinking with the ability to identify root causes and apply high-impact solutions.
  • Hands-on experience with cloud environments (AWS, GCP, or Azure).
  • Experience with CI/CD pipelines and infrastructure-as-code.
  • Proficiency with modern observability platforms.
  • Experience defining and operating with SLOs/SLIs.
  • Ability to influence and drive change across teams without direct managerial authority.
  • Clear and effective communication skills.

Responsibilities

  • Identify and analyze recurring failure patterns across production systems by reviewing incidents, postmortems, and operational data.
  • Prioritize reliability improvements based on impact to MTTR, MTTD, and overall customer-facing stability.
  • Drive the design and adoption of reliability patterns such as resilience mechanisms, safe deployment strategies, observability standards, and dependency protection.
  • Collaborate directly with product and platform teams through code contributions, reviews, and hands-on engineering support.
  • Lead technical discussions in incident reviews and operational forums to identify gaps in monitoring, recovery, and system design.
  • Influence engineering teams across the organization to adopt reliability best practices and shared standards.
  • Evangelize and document reliability improvements to ensure knowledge scaling.
  • Participate in on-call rotations and incident leadership.
  • Contribute to the evolution of internal SRE practices by mentoring engineers.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now