Senior Site Reliability Engineer

New

In the United Kingdom... Possibility to work remotely from locations within the European Union depending on team arrangements.Full-TimeSenior

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

5+ years of hands-on experience in Site Reliability Engineering, Platform Engineering, DevOps, Cloud Infrastructure, or similar infrastructure-focused engineering roles.
Proven expertise operating and scaling high-throughput, highly available production systems.
Deep practical experience with Kubernetes in cloud environments such as Azure, AWS, or GCP.
Strong understanding of observability concepts, including monitoring, SLIs, SLOs, error budgets, logging, and distributed tracing.
Proficiency in Go or Python, with strong software engineering and automation skills.
Experience with Infrastructure as Code tools such as Pulumi, Terraform, or OpenTofu, along with GitOps workflows and CI/CD automation.
Strong knowledge of cloud-native technologies, distributed systems, and reliability engineering best practices.
Demonstrated experience leading infrastructure initiatives, writing technical proposals, and driving architecture decisions.
Strong communication skills with the ability to collaborate effectively across technical teams and stakeholders.
Comfortable participating in on-call rotations and managing critical production incidents.

Drive the architecture and evolution of scalable cloud infrastructure and Kubernetes environments designed for high availability and global growth.
Define and implement platform reliability strategies, including zero-downtime deployments, disaster recovery, rollback mechanisms, and resilience improvements.
Improve and maintain observability systems, monitoring frameworks, and telemetry infrastructure to support operational excellence and system transparency.
Build and optimize Infrastructure as Code and self-service platform capabilities to reduce operational overhead and improve developer experience.
Lead platform-related incident response activities, conduct blameless post-mortems, and implement long-term systemic improvements.
Collaborate closely with engineering teams to define technical roadmaps, architecture standards, and scalable operational practices.
Mentor and support teammates through technical guidance, design reviews, and knowledge sharing initiatives.
Drive continuous improvement in CI/CD pipelines, GitOps workflows, automation strategies, and cloud-native infrastructure operations.

View Full Description & ApplyYou'll be redirected to the employer's site