Senior Site Reliability Engineer
New
F
Flip GmbHResearch & Development
Remote (Europe)Full-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Languages
- English
- Experience
- 5+ years
- Required Skills
- PythonKubernetesAzureGoCI/CDTerraform
Requirements
- 5+ years of experience as an SRE, Platform, DevOps, Infrastructure, Cloud, or Backend Engineer
- Proven track record building and operating high-throughput, highly available production systems
- Deep production-level experience with Kubernetes on any Hyperscaler
- Strong experience with modern observability stacks (e.g., Prometheus, Mimir, VictoriaMetrics, Dash0, Loki, ELK)
- Clear point of view on SLIs, SLOs, and error budgets
- Solid software development skills in Go or Python
- Hands-on experience with Infrastructure as Code (Pulumi, OpenTofu, Terraform)
- Experience with GitOps (e.g., ArgoCD) and CI/CD pipeline design
- Demonstrated ability to lead infrastructure initiatives from design to production
- Experience mentoring engineers
- Business-fluent English
- Willingness to participate in on-call rotations
Responsibilities
- Co-own architecture and evolution of cloud infrastructure on Azure and Kubernetes
- Define resilience strategies including scaling, zero-downtime deployments, and disaster recovery
- Evolve observability stack using Loki, Grafana, Tempo, and Mimir
- Improve Infrastructure as Code platform to enable self-service
- Lead platform-related major incidents and facilitate blameless post-mortems
- Mentor teammates, run RFCs, and conduct design reviews
- Partner with the squad to define the platform roadmap
View Full Description & ApplyYou'll be redirected to the employer's site