Senior Site Reliability Engineer

New
R
ReplitSoftware Development
Remote - Europe; Secondary Locations: Remote - France, Remote - Ireland, Remote - Italy, Remote - Netherlands, Remote - United KingdomFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
4-8 years
Required Skills
PythonKubernetesGoCI/CDTerraformAnsibleDistributed Systems

Requirements

  • 4-8 years of experience in Site Reliability Engineering, DevOps, or Systems/Infrastructure Engineering.
  • Strong programming skills in Python, Go, or similar languages.
  • Deep understanding of distributed systems.
  • Experience with Kubernetes and cloud-native technologies.
  • Proven track record of implementing observability solutions.
  • Strong incident management and response leadership experience.
  • Experience with infrastructure as code and configuration management tools.

Responsibilities

  • Design and implement observability solutions using modern tools.
  • Develop dashboards, metrics, and logging strategies for system health.
  • Architect and implement infrastructure automation using Terraform, Ansible, or Pulumi.
  • Design and maintain CI/CD pipelines.
  • Create self-healing systems for failure scenarios.
  • Define and implement SLOs and SLIs in collaboration with engineering teams.
  • Lead incident response and conduct post-mortems.
  • Build tools and processes to reduce MTTR.
  • Optimize infrastructure performance, capacity, and resource utilization.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now