Senior Site Reliability Engineer
New
I
itD TechInformation Technology
The opportunity is remote in the UK., 11am to 7pm (mostly)ContractSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 5+ years experience designing, deploying and operating mid to large size distributed systems; 2+ years experience developing with languages like Ruby, Python, Go, Scala, or Bash.
- Required Skills
- PythonBashElasticSearchKafkaRubyGoPrometheusLinuxTerraformScala
Requirements
- 5+ years of experience designing, deploying and operating mid to large size distributed systems on Linux (Debian and Ubuntu).
- 2+ years of experience developing with languages such as Ruby, Python, Go, Scala, or Bash.
- Direct experience with ELK stack, Kafka, Prometheus/Thanos/Cortex, Graphite, Ansible, Terraform, and Consul.
- Bachelor’s degree in Computer Science, Software Engineering, Information Technology, or a related field.
- Strong experience in building solutions based on software engineering best practices.
- Ability to work on a highly autonomous team.
- Willingness to participate in a production on-call rotation.
- Strong problem-solving skills for complex distributed systems.
- Excellent capacity for rapid learning of unfamiliar code and systems.
Responsibilities
- Lead the design, development and operation of large-scale, secure observability systems.
- Design, deploy and scale Prometheus architecture to handle 100+ million active series.
- Deploy and operate large, high-performance ElasticSearch clusters.
- Deploy and grow high-throughput data pipelines using Kafka.
- Design and build a self-service alerting system for engineering teams.
- Write libraries and APIs for monitoring, logging, and observability.
- Utilize Terraform for public and private cloud infrastructure deployment.
View Full Description & ApplyYou'll be redirected to the employer's site