Senior Site Reliability Engineer

New

Flip GmbHCloud Infrastructure

Remote (Europe)Full-TimeSenior

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

5+ years of hands-on experience as SRE, Platform, DevOps, Infrastructure, or Cloud Engineer.
Proven track record building and operating high-availability, high-throughput production systems.
Deep production-level experience with Kubernetes on hyperscalers.
Strong experience with observability stacks (e.g., Prometheus, Mimir, VictoriaMetrics, Loki, ELK).
Knowledge of SLIs, SLOs, and Error Budgets.
Solid software development skills in Go (preferred) or Python.
Experience with IaC (Pulumi, OpenTofu, Terraform) and GitOps (ArgoCD).
Ability to lead infrastructure initiatives from design to production.
Experience mentoring engineers.
Proficiency in written and spoken English.
Willingness to participate in on-call rotations.

Take end-to-end responsibility for critical reliability areas and drive technical direction.
Drive architecture and evolution of Azure cloud infrastructure and Kubernetes clusters.
Define resilience strategies including global scaling, zero-downtime deployments, and disaster recovery.
Optimize LGTM observability stack (Loki, Grafana, Tempo, Mimir).
Improve IaC platform and enable self-service for engineering teams.
Lead incident response and conduct blameless post-mortems.
Mentor team members and lead RFCs and design reviews.

View Full Description & ApplyYou'll be redirected to the employer's site