Site Reliability Engineer

A
Workable workplace: remote Workable remote: True Workable locations: United Kingdom Location: United Kingdom RemoteFull-Time
Salary60,000 - 70,000 GBP per year
Apply NowOpens the employer's application page

Job Details

Required Skills
DockerAgileNginxPrometheusTerraformScriptingDatadog

Requirements

  • Experience in performance monitoring and analysis
  • Capacity planning experience
  • Scripting and automation skills, with experience in relevant technologies.
  • Experience with Infrastructure as Code, in particular, Terraform
  • Understanding of relational database technologies and their cloud versions (e.g. AWS Aurora)
  • Experience with messaging and distributed asynchronous workloads
  • Experience with nginx or similar technologies
  • Familiarity with SRE processes.
  • Aware of DevOps principles like the 3 ways and 5 ideals.
  • Experience with other database technologies and cloud platforms.
  • Past experience with Enterprise solutions running at scale
  • Familiarity with Kanban and Agile development processes
  • Experience with containerisation, for example Docker
  • Familiarity with software best practices such as Refactoring, Clean Code, Domain-Driven Design and Test-Driven Development.

Responsibilities

  • Proactively monitor and analyse platform performance.
  • Collaborate with engineering teams to address performance bottlenecks and ensure scalability.
  • Assist engineering teams with implementing and reviewing SLOs
  • Continually improve observability through monitoring and alerting, and dashboards, using tools such as DataDog or Prometheus for example.
  • Ensure the service is highly available and resilient
  • Champion best practices in design for high availability
  • Devise runbooks and run game sessions to test our DR plan, H/A and backups
  • Conduct assessments of capacity and plan for scaling to meet current and future business needs.
  • Work closely with the Head of Platform Engineering and Head of SRE to strategize and implement scalable solutions.
  • Key player in the response and troubleshooting of incidents, ensuring rapid resolution and minimising downtime.
View Full Description & ApplyYou'll be redirected to the employer's site
60,000 - 70,000 GBP per year
Apply Now