Site Reliability Engineer

Flip GmbHSaaS Platform

Remote (Europe)Full-TimeMiddle

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

1–3 years of hands-on experience in SRE, Platform, DevOps, or Infrastructure engineering.
Strong background in operating and scaling cloud infrastructure (Azure, GCP, or AWS).
Deep knowledge of Kubernetes and container orchestration in production.
Hands-on experience with modern observability stacks (e.g., Prometheus, Mimir, Loki, ELK).
Experience defining and operating SLOs and error budgets.
Strong programming skills in Go, Python, or Kotlin.
Hands-on experience with Infrastructure as Code (e.g., Pulumi, OpenTofu, Terraform).
Familiarity with configuration management tools (e.g., Ansible, Chef).
Fluent English communication skills.
Willingness to participate in on-call rotations.

Scale and optimize Azure cloud infrastructure and Kubernetes clusters.
Implement zero-downtime deployments and disaster recovery strategies.
Develop and maintain the LGTM (Loki, Grafana, Tempo, Mimir) observability stack.
Automate infrastructure using Infrastructure as Code (Pulumi in Go).
Drive reliability practices including CI/CD, incident management, and post-mortems.
Define and optimize platform SLOs and error budgets.
Participate in on-call rotations.

View Full Description & ApplyYou'll be redirected to the employer's site