Site Reliability Engineer

F
Flip GmbHSaaS Platform
Remote (Europe)Full-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Languages
English
Experience
1–3 Jahre
Required Skills
PythonKubernetesAzureGoGrafanaPrometheusTerraform

Requirements

  • 1–3 years of hands-on experience in SRE, Platform, DevOps, or Infrastructure engineering.
  • Strong background in operating and scaling cloud infrastructure (Azure, GCP, or AWS).
  • Deep knowledge of Kubernetes and container orchestration in production.
  • Hands-on experience with modern observability stacks (e.g., Prometheus, Mimir, Loki, ELK).
  • Experience defining and operating SLOs and error budgets.
  • Strong programming skills in Go, Python, or Kotlin.
  • Hands-on experience with Infrastructure as Code (e.g., Pulumi, OpenTofu, Terraform).
  • Familiarity with configuration management tools (e.g., Ansible, Chef).
  • Fluent English communication skills.
  • Willingness to participate in on-call rotations.

Responsibilities

  • Scale and optimize Azure cloud infrastructure and Kubernetes clusters.
  • Implement zero-downtime deployments and disaster recovery strategies.
  • Develop and maintain the LGTM (Loki, Grafana, Tempo, Mimir) observability stack.
  • Automate infrastructure using Infrastructure as Code (Pulumi in Go).
  • Drive reliability practices including CI/CD, incident management, and post-mortems.
  • Define and optimize platform SLOs and error budgets.
  • Participate in on-call rotations.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now