Staff Site Reliability Engineer

New

Europe Timezone, Europe TimezoneFull-TimeStaff

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Extensive hands-on experience in SRE or Production Engineering roles.
Demonstrated experience building or scaling SRE practices in high-growth or complex environments.
Deep expertise in AWS or Azure-based cloud infrastructure.
Strong experience with Kubernetes (including migration, scaling, and production hardening).
Advanced Infrastructure-as-Code experience (Terraform or equivalent).
End-to-end CI/CD pipeline design and optimisation experience.
Strong experience with observability tooling across distributed systems.
Experience troubleshooting complex multi-tenant or customer-hosted environments.
Experience supporting production data platforms and ML systems.
MLOps experience, including model deployment and monitoring.
Strong understanding of distributed systems, scalability, and fault tolerance.

Architect, deploy, and operate scalable, secure production environments (AWS preferred).
Lead reliability improvements across multiple engineering streams.
Design and evolve Kubernetes-based infrastructure, including migration and optimisation initiatives.
Build and enforce strong Infrastructure-as-Code standards.
Define and operationalise SLIs, SLOs, and error budgets.
Strengthen observability across applications, infrastructure, data pipelines, and ML systems.
Work across and optimise the entire CI/CD pipeline.
Lead incident response for complex cross-system failures and drive postmortems.
Support and productionise ML workloads (MLOps).
Mentor engineers and raise the overall reliability bar.

View Full Description & ApplyYou'll be redirected to the employer's site