Site Reliability Engineer

Workable workplace: remote Workable remote: True Workable locations: United Kingdom Location: United Kingdom RemoteFull-Time

Salary60,000 - 70,000 GBP per year

Apply NowOpens the employer's application page

Job Details

Experience in performance monitoring and analysis
Capacity planning experience
Scripting and automation skills, with experience in relevant technologies.
Experience with Infrastructure as Code, in particular, Terraform
Understanding of relational database technologies and their cloud versions (e.g. AWS Aurora)
Experience with messaging and distributed asynchronous workloads
Experience with nginx or similar technologies
Familiarity with SRE processes.
Aware of DevOps principles like the 3 ways and 5 ideals.
Experience with other database technologies and cloud platforms.
Past experience with Enterprise solutions running at scale
Familiarity with Kanban and Agile development processes
Experience with containerisation, for example Docker
Familiarity with software best practices such as Refactoring, Clean Code, Domain-Driven Design and Test-Driven Development.

Proactively monitor and analyse platform performance.
Collaborate with engineering teams to address performance bottlenecks and ensure scalability.
Assist engineering teams with implementing and reviewing SLOs
Continually improve observability through monitoring and alerting, and dashboards, using tools such as DataDog or Prometheus for example.
Ensure the service is highly available and resilient
Champion best practices in design for high availability
Devise runbooks and run game sessions to test our DR plan, H/A and backups
Conduct assessments of capacity and plan for scaling to meet current and future business needs.
Work closely with the Head of Platform Engineering and Head of SRE to strategize and implement scalable solutions.
Key player in the response and troubleshooting of incidents, ensuring rapid resolution and minimising downtime.

View Full Description & ApplyYou'll be redirected to the employer's site