Senior Site Reliability Engineer
New
F
Flip GmbHCloud Infrastructure
Remote (Europe)Full-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Languages
- English
- Experience
- 5+ years
- Required Skills
- PythonKubernetesAzureGoTerraform
Requirements
- 5+ years of hands-on experience as SRE, Platform, DevOps, Infrastructure, or Cloud Engineer.
- Proven track record building and operating high-availability, high-throughput production systems.
- Deep production-level experience with Kubernetes on hyperscalers.
- Strong experience with observability stacks (e.g., Prometheus, Mimir, VictoriaMetrics, Loki, ELK).
- Knowledge of SLIs, SLOs, and Error Budgets.
- Solid software development skills in Go (preferred) or Python.
- Experience with IaC (Pulumi, OpenTofu, Terraform) and GitOps (ArgoCD).
- Ability to lead infrastructure initiatives from design to production.
- Experience mentoring engineers.
- Proficiency in written and spoken English.
- Willingness to participate in on-call rotations.
Responsibilities
- Take end-to-end responsibility for critical reliability areas and drive technical direction.
- Drive architecture and evolution of Azure cloud infrastructure and Kubernetes clusters.
- Define resilience strategies including global scaling, zero-downtime deployments, and disaster recovery.
- Optimize LGTM observability stack (Loki, Grafana, Tempo, Mimir).
- Improve IaC platform and enable self-service for engineering teams.
- Lead incident response and conduct blameless post-mortems.
- Mentor team members and lead RFCs and design reviews.
View Full Description & ApplyYou'll be redirected to the employer's site