Senior Site Reliability Engineer
New
Remote -UKFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Required Skills
- KubernetesGoGrafanaPrometheusCI/CDLinuxTerraformAnsible
Requirements
- Strong experience running production Kubernetes environments.
- Strong Linux fundamentals, including systemd, networking, storage and performance troubleshooting.
- Experience with at least one Kubernetes distribution such as OKD, OpenShift, vanilla Kubernetes, Rancher, EKS, AKS or GKE.
- Solid infrastructure as code experience, including Ansible plus Terraform or equivalent.
- GitOps and CI/CD experience managing full application and component lifecycles.
- Experience with Prometheus, Grafana, Elastic Stack, or OpenTelemetry.
- Experience working with identity and access technologies such as OIDC, SAML, SCIM or Keycloak.
- Experience with virtualisation or infrastructure platforms such as KVM, libvirt or VMware.
- Scripting or tooling experience using Go, Python, shell scripting or similar.
- Experience working in secure, regulated or enterprise-scale environments.
Responsibilities
- Operate, harden and extend production OpenShift / OKD / Kubernetes clusters across on-premises and hybrid environments.
- Support the migration from VMware to KVM, helping modernise the underlying compute and storage layer.
- Own and improve CI/CD processes across the full lifecycle of platform and application components.
- Develop and mature GitOps deployment practices using tools such as Argo CD or Flux.
- Maintain and improve core platform services including identity, ingress, observability, certificate management, service mesh and container registry capabilities.
- Build and operate observability across logs, metrics, traces, alerting, SLOs and error budgets.
- Improve platform hardening in line with secure and regulated environment requirements.
- Automate repeatable operational tasks using tools such as Ansible, Terraform, Go, or Python.
- Lead incident response activity, support blameless post-mortems and drive systemic fixes.
- Participate in an on-call rota.
View Full Description & ApplyYou'll be redirected to the employer's site