Senior Site Reliability Engineer

New
F
Flip GmbHCloud Infrastructure
Remote (Europe)Full-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Languages
English
Experience
5+ years
Required Skills
PythonKubernetesAzureGoTerraform

Requirements

  • 5+ years of hands-on experience as SRE, Platform, DevOps, Infrastructure, or Cloud Engineer.
  • Proven track record building and operating high-availability, high-throughput production systems.
  • Deep production-level experience with Kubernetes on hyperscalers.
  • Strong experience with observability stacks (e.g., Prometheus, Mimir, VictoriaMetrics, Loki, ELK).
  • Knowledge of SLIs, SLOs, and Error Budgets.
  • Solid software development skills in Go (preferred) or Python.
  • Experience with IaC (Pulumi, OpenTofu, Terraform) and GitOps (ArgoCD).
  • Ability to lead infrastructure initiatives from design to production.
  • Experience mentoring engineers.
  • Proficiency in written and spoken English.
  • Willingness to participate in on-call rotations.

Responsibilities

  • Take end-to-end responsibility for critical reliability areas and drive technical direction.
  • Drive architecture and evolution of Azure cloud infrastructure and Kubernetes clusters.
  • Define resilience strategies including global scaling, zero-downtime deployments, and disaster recovery.
  • Optimize LGTM observability stack (Loki, Grafana, Tempo, Mimir).
  • Improve IaC platform and enable self-service for engineering teams.
  • Lead incident response and conduct blameless post-mortems.
  • Mentor team members and lead RFCs and design reviews.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now