Site Reliability Engineer

New

Flip AppAI employee experience platform

Im europäischen AuslandFull-TimeMiddle

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Languages: Englisch
Experience: 1–3 Jahre
Required Skills: AWSPythonGCPKotlinKubernetesAzureGoPrometheusCI/CDTerraformAnsible

1–3 years of hands-on experience as Site Reliability Engineer (SRE), Platform Engineer, DevOps Engineer, Infrastructure Engineer, Cloud Engineer or Backend Engineer with strong infrastructure focus
Experience in operating and scaling cloud infrastructures (Azure, GCP, AWS)
Deep knowledge in Kubernetes and container orchestration in production environments
Hands-on experience with modern Observability-Stacks (e.g., Prometheus, Mimir, Loki, ELK) and familiar with defining and operating SLOs and Error Budgets
Fundierte Software-Entwicklungskenntnisse in Go (bevorzugt, da unser IaC auf Pulumi in Go läuft), Python oder Kotlin
Hands-on experience with Infrastructure as Code (e.g., Pulumi, OpenTofu, Terraform) and Konfigurations-Tools (e.g., Ansible, Chef)
A collaborative mindset, strong communication skills and verhandlungssicheres Englisch
Willingness to participate in On-Call-Rotationen (Rufbereitschaft), um die Zuverlässigkeit unserer Plattform zu gewährleisten

Enable scaling: Expand and optimize our cloud infrastructure on Azure and our Kubernetes clusters – designed for high throughput and highest availability – to support Flip's rapid global growth.
Ensure resilience & security: Design and implement zero-downtime deployments, rollback mechanisms, and disaster recovery strategies that keep our platform available around the clock.
Create observability: Further develop our LGTM stack (Loki, Grafana, Tempo, Mimir) to provide every team with the necessary visibility – and use it to define and optimize our SLOs.
Automate everything: Design, develop, and optimize Infrastructure as Code with Pulumi in Go to eliminate manual effort (Toil) and provide our platform as self-service for engineering teams.
Drive reliability practices: Promote CI/CD best practices, incident management, post-mortems, and developer experience across the entire engineering organization.
Shape our roadmap: Work with your squad and engineering leadership to define the direction of the platform – from scalable high-throughput systems and cost optimization to security posture and compliance.

View Full Description & ApplyYou'll be redirected to the employer's site