Senior Site Reliability Engineer

New
W
Wikimedia FoundationTechnology/Non-profit
Please note that we are currently able to hire in the following: US States: Arizona, California, Colorado, Connecticut, District of Columbia*, Florida, Georgia, Idaho, Illinois, Indiana, Iowa, Maryland, Massachusetts, Michigan, Minnesota, Missouri, New Jersey, New Mexico, New York, North Carolina, Ohio, Oklahoma, Oregon, Pennsylvania, Puerto Rico*, Rhode Island, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, Wisconsin and Wyoming (*US Territory or Federal District) Countries: Brazil, Canada, Colombia, Germany, Ghana, India, Indonesia, Italy, Kenya*, Mexico, Morocco, Netherlands, Poland, Singapore*, South Africa, Spain, Switzerland and the United Kingdom.Full-TimeSenior
Salary116,633 - 181,243 USD per year
Apply NowOpens the employer's application page

Job Details

Required Skills
AWSPythonKubernetesGoPrometheusTerraformAnsibleGitLab

Requirements

  • Experience with Infrastructure as Code (Terraform, Ansible)
  • Proficiency in at least one programming language (e.g., Python, Go)
  • Experience operating and optimizing cloud-based systems (AWS, Azure, or GCP)
  • Experience building and maintaining CI/CD pipelines and GitOps workflows (e.g., GitLab, ArgoCD)
  • Experience with incident response, on-call practices, and postmortems
  • Strong understanding of SRE best practices (SLOs, SLIs, error budgets)
  • Experience with observability tools (e.g., Prometheus, OpenTelemetry)
  • Ability to work effectively in a distributed, cross-functional environment
  • Strong documentation and communication skills

Responsibilities

  • Define, track, and improve Service Level Objectives (SLOs), SLIs, and error budgets
  • Build and enhance observability systems (metrics, logs, and distributed tracing)
  • Drive reliability engineering practices, including capacity planning, load testing, and resilience validation
  • Improve developer experience (DevEx) by enabling self-service infrastructure
  • Design, implement, and optimize CI/CD and GitOps workflows
  • Implement secure-by-default infrastructure and enforce best practices
  • Continuously optimize infrastructure cost and efficiency using FinOps principles
  • Establish and track operational metrics such as MTTR, MTTD, and incident frequency
  • Reduce operational toil through automation-first solutions
  • Collaborate with a global team and mentor peers
View Full Description & ApplyYou'll be redirected to the employer's site
116,633 - 181,243 USD per year
Apply Now