Site Reliability Engineer
F
Flip GmbHSaaS Platform
Remote (Europe)Full-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Languages
- English
- Experience
- 1–3 Jahre
- Required Skills
- PythonKubernetesAzureGoGrafanaPrometheusTerraform
Requirements
- 1–3 years of hands-on experience in SRE, Platform, DevOps, or Infrastructure engineering.
- Strong background in operating and scaling cloud infrastructure (Azure, GCP, or AWS).
- Deep knowledge of Kubernetes and container orchestration in production.
- Hands-on experience with modern observability stacks (e.g., Prometheus, Mimir, Loki, ELK).
- Experience defining and operating SLOs and error budgets.
- Strong programming skills in Go, Python, or Kotlin.
- Hands-on experience with Infrastructure as Code (e.g., Pulumi, OpenTofu, Terraform).
- Familiarity with configuration management tools (e.g., Ansible, Chef).
- Fluent English communication skills.
- Willingness to participate in on-call rotations.
Responsibilities
- Scale and optimize Azure cloud infrastructure and Kubernetes clusters.
- Implement zero-downtime deployments and disaster recovery strategies.
- Develop and maintain the LGTM (Loki, Grafana, Tempo, Mimir) observability stack.
- Automate infrastructure using Infrastructure as Code (Pulumi in Go).
- Drive reliability practices including CI/CD, incident management, and post-mortems.
- Define and optimize platform SLOs and error budgets.
- Participate in on-call rotations.
View Full Description & ApplyYou'll be redirected to the employer's site