Senior Site Reliability Engineer

New

Flip GmbHAI Employee Experience

Remote (Europe)Full-TimeSenior

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

5+ years of hands-on experience in SRE, Platform, DevOps, or Infrastructure engineering.
Proven track record building/operating high-throughput, high-availability systems.
Production-level experience with Kubernetes on Hyperscalers.
Experience with modern observability stacks (e.g., Prometheus, Mimir, VictoriaMetrics, Loki).
Solid software development skills in Go (preferred) or Python.
Hands-on experience with Infrastructure as Code (Pulumi, OpenTofu, Terraform).
Experience with GitOps (e.g., ArgoCD) and CI/CD pipeline design.
Ability to lead complex infrastructure initiatives from design to production.
Experience mentoring engineers.
Fluent English communication skills.
Willingness to participate in on-call rotations.

Co-own the architecture and evolution of cloud infrastructure on Azure and Kubernetes.
Define resilience strategy including global scaling, zero-downtime, and disaster recovery.
Improve observability stack foundations (Loki, Grafana, Tempo, Mimir).
Develop self-service Infrastructure as Code platforms.
Lead platform-related major incidents and drive post-mortems.
Mentor team members and conduct RFCs/design reviews.
Partner with the squad to define platform roadmaps.

View Full Description & ApplyYou'll be redirected to the employer's site