Senior Site Reliability Engineer - Wikimedia Enterprise
New
US States: Arizona, California, Colorado, Connecticut, District of Columbia*, Florida, Georgia, Idaho, Illinois, Indiana, Iowa, Maryland, Massachusetts, Michigan, Minnesota, Missouri, New Jersey, New Mexico, New York, North Carolina, Ohio, Oklahoma, Oregon, Pennsylvania, Puerto Rico*, Rhode Island, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, Wisconsin and Wyoming. Countries: Brazil, Canada, Colombia, Germany, Ghana, India, Indonesia, Italy, Kenya*, Mexico, Morocco, Netherlands, Poland, Singapore*, South Africa, Spain, Switzerland and the United Kingdom.Full-TimeSenior
Salary116,633 - 181,243 USD per year
Apply NowOpens the employer's application page
Job Details
- Required Skills
- AWSPythonKubernetesGoPrometheusTerraformAnsibleGitLab
Requirements
- Experience with Infrastructure as Code and automation tools (e.g., Terraform, Ansible).
- Proficiency in at least one programming language (e.g., Python, Go).
- Experience operating and optimizing cloud-based systems (AWS, Azure, or GCP).
- Experience building and maintaining CI/CD pipelines and GitOps workflows (e.g., GitLab, ArgoCD).
- Experience with incident response, on-call practices, and leading postmortems.
- Strong understanding of SRE best practices including SLOs, SLIs, and error budgets.
- Experience in observability (metrics, logging, and distributed tracing e.g., Prometheus, OpenTelemetry).
- Proven experience operating highly available, large-scale distributed systems.
- Ability to work effectively in a distributed, cross-functional environment.
Responsibilities
- Define, track, and improve Service Level Objectives (SLOs), SLIs, and error budgets.
- Build and enhance observability systems (metrics, logs, and distributed tracing).
- Drive reliability engineering practices including capacity planning, load testing, and chaos testing.
- Improve developer experience by enabling self-service infrastructure and streamlining workflows.
- Design, implement, and optimize CI/CD and GitOps workflows using tools like GitLab and ArgoCD.
- Implement secure-by-default infrastructure and enforce best practices.
- Optimize infrastructure cost and efficiency using FinOps principles.
- Participate in incident response and on-call rotations.
View Full Description & ApplyYou'll be redirected to the employer's site