Senior Site Reliability Engineer
New
R
ReplitSoftware Development
Remote - Europe; Secondary Locations: Remote - France, Remote - Ireland, Remote - Italy, Remote - Netherlands, Remote - United KingdomFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 4-8 years
- Required Skills
- PythonKubernetesGoCI/CDTerraformAnsibleDistributed Systems
Requirements
- 4-8 years of experience in Site Reliability Engineering, DevOps, or Systems/Infrastructure Engineering.
- Strong programming skills in Python, Go, or similar languages.
- Deep understanding of distributed systems.
- Experience with Kubernetes and cloud-native technologies.
- Proven track record of implementing observability solutions.
- Strong incident management and response leadership experience.
- Experience with infrastructure as code and configuration management tools.
Responsibilities
- Design and implement observability solutions using modern tools.
- Develop dashboards, metrics, and logging strategies for system health.
- Architect and implement infrastructure automation using Terraform, Ansible, or Pulumi.
- Design and maintain CI/CD pipelines.
- Create self-healing systems for failure scenarios.
- Define and implement SLOs and SLIs in collaboration with engineering teams.
- Lead incident response and conduct post-mortems.
- Build tools and processes to reduce MTTR.
- Optimize infrastructure performance, capacity, and resource utilization.
View Full Description & ApplyYou'll be redirected to the employer's site