Senior Site Reliability Engineer

Canada, Global collaboration across multiple time zonesFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Languages
English
Experience
6+ years
Required Skills
PythonBashKubernetesRubyGoLinuxDevOpsAnsible

Requirements

  • 6+ years of experience in Site Reliability Engineering, DevOps, or infrastructure operations roles within complex distributed systems.
  • Strong proficiency in Linux systems administration, troubleshooting, and performance tuning.
  • Experience with scripting languages such as Python, Bash, Go, or Ruby for automation and operational tooling.
  • Hands-on experience with configuration management tools such as Puppet or Ansible.
  • Solid understanding of distributed systems, caching technologies, and system optimization techniques.
  • Experience with Linux package management (e.g., Debian-based systems).
  • Proven track record of automating operational processes and identifying opportunities for system improvement.
  • Experience participating in incident response, postmortems, and reliability engineering practices.
  • Strong communication skills in English, with the ability to work effectively in a fully remote, globally distributed team.
  • Ability to work independently while collaborating across multiple time zones and teams.

Responsibilities

  • Perform day-to-day operations and DevOps responsibilities across large-scale public-facing infrastructure, including deployment, configuration, maintenance, and troubleshooting.
  • Manage and optimize configuration and deployment systems using tools such as Puppet and Kubernetes.
  • Automate infrastructure provisioning, service deployment, and operational workflows to improve reliability and efficiency.
  • Collaborate with product and engineering teams to design scalable architectures and ensure systems operate reliably under global traffic loads.
  • Participate in a 24/7 on-call rotation, handling incident response, system alerts, troubleshooting, and post-incident reviews.
  • Conduct root cause analysis of production incidents and implement preventive measures to improve system stability.
  • Contribute to system monitoring, observability, and performance optimization initiatives.
  • Mentor engineers and share operational expertise within a distributed, cross-functional team environment.
  • Work asynchronously with global teams while ensuring clear and effective technical communication.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now