Senior Site Reliability Engineer
New
Source API remote eligibility restrictions: Germany, EU time zonesFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Languages
- English
- Required Skills
- AWSPostgreSQLPythonDjangoJavaKubernetesSpring BootGrafana
Requirements
- Solid programming experience, namely Python (Django and AsyncIO) and/or Java (Spring Boot)
- Experience in maintaining an observability tools suite (specifically, LGTM - Loki, Grafana, Tempo, Mimir)
- Experience in development and maintenance of Python services in production
- Strong experience with AWS and Kubernetes
- Solid proficiency in working with relational databases (PostgreSQL) and messaging systems (e.g. RabbitMQ, NATS, Kafka)
- An experienced on-call SRE engineer
- Enjoy hands-on troubleshooting of distributed systems in production environments
- Proficiency in English, both written and spoken
Responsibilities
- Own and influence the incident management process end-to-end
- Maintain and evolve on-prem observability stack
- Keep production applications running smoothly by participating in the on-call rotation
- Develop automations and tools to support platform reliability
- Contribute to production services with performance and resiliency in mind
- Collaborate with product engineers to foster SRE principles within the R&D organization
- Be a mentor for the SRE team or product engineers
View Full Description & ApplyYou'll be redirected to the employer's site