Senior Site Reliability Engineer

New

itD TechInformation Technology

The opportunity is remote in the UK., 11am to 7pm (mostly)ContractSenior

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Experience: 5+ years experience designing, deploying and operating mid to large size distributed systems; 2+ years experience developing with languages like Ruby, Python, Go, Scala, or Bash.
Required Skills: PythonBashElasticSearchKafkaRubyGoPrometheusLinuxTerraformScala

5+ years of experience designing, deploying and operating mid to large size distributed systems on Linux (Debian and Ubuntu).
2+ years of experience developing with languages such as Ruby, Python, Go, Scala, or Bash.
Direct experience with ELK stack, Kafka, Prometheus/Thanos/Cortex, Graphite, Ansible, Terraform, and Consul.
Bachelor’s degree in Computer Science, Software Engineering, Information Technology, or a related field.
Strong experience in building solutions based on software engineering best practices.
Ability to work on a highly autonomous team.
Willingness to participate in a production on-call rotation.
Strong problem-solving skills for complex distributed systems.
Excellent capacity for rapid learning of unfamiliar code and systems.

Lead the design, development and operation of large-scale, secure observability systems.
Design, deploy and scale Prometheus architecture to handle 100+ million active series.
Deploy and operate large, high-performance ElasticSearch clusters.
Deploy and grow high-throughput data pipelines using Kafka.
Design and build a self-service alerting system for engineering teams.
Write libraries and APIs for monitoring, logging, and observability.
Utilize Terraform for public and private cloud infrastructure deployment.

View Full Description & ApplyYou'll be redirected to the employer's site