Senior Site Reliability Engineer

New
Source API remote eligibility restrictions: Germany, EU time zonesFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Languages
English
Required Skills
AWSPostgreSQLPythonDjangoJavaKubernetesSpring BootGrafana

Requirements

  • Solid programming experience, namely Python (Django and AsyncIO) and/or Java (Spring Boot)
  • Experience in maintaining an observability tools suite (specifically, LGTM - Loki, Grafana, Tempo, Mimir)
  • Experience in development and maintenance of Python services in production
  • Strong experience with AWS and Kubernetes
  • Solid proficiency in working with relational databases (PostgreSQL) and messaging systems (e.g. RabbitMQ, NATS, Kafka)
  • An experienced on-call SRE engineer
  • Enjoy hands-on troubleshooting of distributed systems in production environments
  • Proficiency in English, both written and spoken

Responsibilities

  • Own and influence the incident management process end-to-end
  • Maintain and evolve on-prem observability stack
  • Keep production applications running smoothly by participating in the on-call rotation
  • Develop automations and tools to support platform reliability
  • Contribute to production services with performance and resiliency in mind
  • Collaborate with product engineers to foster SRE principles within the R&D organization
  • Be a mentor for the SRE team or product engineers
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now