Senior Site Reliability Engineer

New
Remote -UKFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Required Skills
KubernetesGoGrafanaPrometheusCI/CDLinuxTerraformAnsible

Requirements

  • Strong experience running production Kubernetes environments.
  • Strong Linux fundamentals, including systemd, networking, storage and performance troubleshooting.
  • Experience with at least one Kubernetes distribution such as OKD, OpenShift, vanilla Kubernetes, Rancher, EKS, AKS or GKE.
  • Solid infrastructure as code experience, including Ansible plus Terraform or equivalent.
  • GitOps and CI/CD experience managing full application and component lifecycles.
  • Experience with Prometheus, Grafana, Elastic Stack, or OpenTelemetry.
  • Experience working with identity and access technologies such as OIDC, SAML, SCIM or Keycloak.
  • Experience with virtualisation or infrastructure platforms such as KVM, libvirt or VMware.
  • Scripting or tooling experience using Go, Python, shell scripting or similar.
  • Experience working in secure, regulated or enterprise-scale environments.

Responsibilities

  • Operate, harden and extend production OpenShift / OKD / Kubernetes clusters across on-premises and hybrid environments.
  • Support the migration from VMware to KVM, helping modernise the underlying compute and storage layer.
  • Own and improve CI/CD processes across the full lifecycle of platform and application components.
  • Develop and mature GitOps deployment practices using tools such as Argo CD or Flux.
  • Maintain and improve core platform services including identity, ingress, observability, certificate management, service mesh and container registry capabilities.
  • Build and operate observability across logs, metrics, traces, alerting, SLOs and error budgets.
  • Improve platform hardening in line with secure and regulated environment requirements.
  • Automate repeatable operational tasks using tools such as Ansible, Terraform, Go, or Python.
  • Lead incident response activity, support blameless post-mortems and drive systemic fixes.
  • Participate in an on-call rota.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now