Senior Site Reliability Engineer

New
In the United Kingdom... Possibility to work remotely from locations within the European Union depending on team arrangements.Full-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
5+ years
Required Skills
AWSPythonGCPKubernetesAzureGoCI/CDTerraform

Requirements

  • 5+ years of hands-on experience in Site Reliability Engineering, Platform Engineering, DevOps, Cloud Infrastructure, or similar infrastructure-focused engineering roles.
  • Proven expertise operating and scaling high-throughput, highly available production systems.
  • Deep practical experience with Kubernetes in cloud environments such as Azure, AWS, or GCP.
  • Strong understanding of observability concepts, including monitoring, SLIs, SLOs, error budgets, logging, and distributed tracing.
  • Proficiency in Go or Python, with strong software engineering and automation skills.
  • Experience with Infrastructure as Code tools such as Pulumi, Terraform, or OpenTofu, along with GitOps workflows and CI/CD automation.
  • Strong knowledge of cloud-native technologies, distributed systems, and reliability engineering best practices.
  • Demonstrated experience leading infrastructure initiatives, writing technical proposals, and driving architecture decisions.
  • Strong communication skills with the ability to collaborate effectively across technical teams and stakeholders.
  • Comfortable participating in on-call rotations and managing critical production incidents.

Responsibilities

  • Drive the architecture and evolution of scalable cloud infrastructure and Kubernetes environments designed for high availability and global growth.
  • Define and implement platform reliability strategies, including zero-downtime deployments, disaster recovery, rollback mechanisms, and resilience improvements.
  • Improve and maintain observability systems, monitoring frameworks, and telemetry infrastructure to support operational excellence and system transparency.
  • Build and optimize Infrastructure as Code and self-service platform capabilities to reduce operational overhead and improve developer experience.
  • Lead platform-related incident response activities, conduct blameless post-mortems, and implement long-term systemic improvements.
  • Collaborate closely with engineering teams to define technical roadmaps, architecture standards, and scalable operational practices.
  • Mentor and support teammates through technical guidance, design reviews, and knowledge sharing initiatives.
  • Drive continuous improvement in CI/CD pipelines, GitOps workflows, automation strategies, and cloud-native infrastructure operations.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now