Senior Site Reliability Engineer

New
I
itD TechInformation Technology
The opportunity is remote in the UK., 11am to 7pm (mostly)ContractSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
5+ years experience designing, deploying and operating mid to large size distributed systems; 2+ years experience developing with languages like Ruby, Python, Go, Scala, or Bash.
Required Skills
PythonBashElasticSearchKafkaRubyGoPrometheusLinuxTerraformScala

Requirements

  • 5+ years of experience designing, deploying and operating mid to large size distributed systems on Linux (Debian and Ubuntu).
  • 2+ years of experience developing with languages such as Ruby, Python, Go, Scala, or Bash.
  • Direct experience with ELK stack, Kafka, Prometheus/Thanos/Cortex, Graphite, Ansible, Terraform, and Consul.
  • Bachelor’s degree in Computer Science, Software Engineering, Information Technology, or a related field.
  • Strong experience in building solutions based on software engineering best practices.
  • Ability to work on a highly autonomous team.
  • Willingness to participate in a production on-call rotation.
  • Strong problem-solving skills for complex distributed systems.
  • Excellent capacity for rapid learning of unfamiliar code and systems.

Responsibilities

  • Lead the design, development and operation of large-scale, secure observability systems.
  • Design, deploy and scale Prometheus architecture to handle 100+ million active series.
  • Deploy and operate large, high-performance ElasticSearch clusters.
  • Deploy and grow high-throughput data pipelines using Kafka.
  • Design and build a self-service alerting system for engineering teams.
  • Write libraries and APIs for monitoring, logging, and observability.
  • Utilize Terraform for public and private cloud infrastructure deployment.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now