Senior Site Reliability Engineer - Wikimedia Enterprise

New
US States: Arizona, California, Colorado, Connecticut, District of Columbia*, Florida, Georgia, Idaho, Illinois, Indiana, Iowa, Maryland, Massachusetts, Michigan, Minnesota, Missouri, New Jersey, New Mexico, New York, North Carolina, Ohio, Oklahoma, Oregon, Pennsylvania, Puerto Rico*, Rhode Island, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, Wisconsin and Wyoming. Countries: Brazil, Canada, Colombia, Germany, Ghana, India, Indonesia, Italy, Kenya*, Mexico, Morocco, Netherlands, Poland, Singapore*, South Africa, Spain, Switzerland and the United Kingdom.Full-TimeSenior
Salary116,633 - 181,243 USD per year
Apply NowOpens the employer's application page

Job Details

Required Skills
AWSPythonKubernetesGoPrometheusTerraformAnsibleGitLab

Requirements

  • Experience with Infrastructure as Code and automation tools (e.g., Terraform, Ansible).
  • Proficiency in at least one programming language (e.g., Python, Go).
  • Experience operating and optimizing cloud-based systems (AWS, Azure, or GCP).
  • Experience building and maintaining CI/CD pipelines and GitOps workflows (e.g., GitLab, ArgoCD).
  • Experience with incident response, on-call practices, and leading postmortems.
  • Strong understanding of SRE best practices including SLOs, SLIs, and error budgets.
  • Experience in observability (metrics, logging, and distributed tracing e.g., Prometheus, OpenTelemetry).
  • Proven experience operating highly available, large-scale distributed systems.
  • Ability to work effectively in a distributed, cross-functional environment.

Responsibilities

  • Define, track, and improve Service Level Objectives (SLOs), SLIs, and error budgets.
  • Build and enhance observability systems (metrics, logs, and distributed tracing).
  • Drive reliability engineering practices including capacity planning, load testing, and chaos testing.
  • Improve developer experience by enabling self-service infrastructure and streamlining workflows.
  • Design, implement, and optimize CI/CD and GitOps workflows using tools like GitLab and ArgoCD.
  • Implement secure-by-default infrastructure and enforce best practices.
  • Optimize infrastructure cost and efficiency using FinOps principles.
  • Participate in incident response and on-call rotations.
View Full Description & ApplyYou'll be redirected to the employer's site
116,633 - 181,243 USD per year
Apply Now