Senior Site Reliability Engineer - Wikimedia Enterprise

New

US States: Arizona, California, Colorado, Connecticut, District of Columbia*, Florida, Georgia, Idaho, Illinois, Indiana, Iowa, Maryland, Massachusetts, Michigan, Minnesota, Missouri, New Jersey, New Mexico, New York, North Carolina, Ohio, Oklahoma, Oregon, Pennsylvania, Puerto Rico*, Rhode Island, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, Wisconsin and Wyoming. Countries: Brazil, Canada, Colombia, Germany, Ghana, India, Indonesia, Italy, Kenya*, Mexico, Morocco, Netherlands, Poland, Singapore*, South Africa, Spain, Switzerland and the United Kingdom.Full-TimeSenior

Salary116,633 - 181,243 USD per year

Apply NowOpens the employer's application page

Job Details

Required Skills: AWSPythonKubernetesGoPrometheusTerraformAnsibleGitLab

Requirements

Experience with Infrastructure as Code and automation tools (e.g., Terraform, Ansible).
Proficiency in at least one programming language (e.g., Python, Go).
Experience operating and optimizing cloud-based systems (AWS, Azure, or GCP).
Experience building and maintaining CI/CD pipelines and GitOps workflows (e.g., GitLab, ArgoCD).
Experience with incident response, on-call practices, and leading postmortems.
Strong understanding of SRE best practices including SLOs, SLIs, and error budgets.
Experience in observability (metrics, logging, and distributed tracing e.g., Prometheus, OpenTelemetry).
Proven experience operating highly available, large-scale distributed systems.
Ability to work effectively in a distributed, cross-functional environment.

Responsibilities

Define, track, and improve Service Level Objectives (SLOs), SLIs, and error budgets.
Build and enhance observability systems (metrics, logs, and distributed tracing).
Drive reliability engineering practices including capacity planning, load testing, and chaos testing.
Improve developer experience by enabling self-service infrastructure and streamlining workflows.
Design, implement, and optimize CI/CD and GitOps workflows using tools like GitLab and ArgoCD.
Implement secure-by-default infrastructure and enforce best practices.
Optimize infrastructure cost and efficiency using FinOps principles.
Participate in incident response and on-call rotations.

View Full Description & ApplyYou'll be redirected to the employer's site

Similar Jobs