Site Reliability Engineer
New
C
CodesphereCloud Infrastructure
Remote (Germany)Full-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Required Skills
- KubernetesGoCI/CDTerraformAnsible
Requirements
- Proven experience in an SRE, DevOps, or platform engineering role with hands-on production ownership
- Strong knowledge of Kubernetes, Terraform, and Ansible
- Familiarity with Ceph or comparable distributed storage systems
- Experience with SLOs, SLIs, error budgets, and CI/CD pipeline design
- Degree in a relevant field or comparable qualification
- Calm, structured, and fast under pressure with strong debugging and incident response skills
- Good communicator, able to translate operational concerns into guidance for Dev teams
- Go development experience is a plus
Responsibilities
- Define and enforce SLOs, SLIs, and SLAs across production
- Monitor system health, plan capacity, and automate deployments, patching, and infrastructure provisioning
- Diagnose and resolve production incidents fast – including 24/7 on-call participation
- Lead post-mortems and turn findings into prevention; maintain runbooks and escalation procedures
- Manage cloud infrastructure via IaC and own CI/CD pipeline design and maintenance
- Drive scalability, fault tolerance, disaster recovery, and security compliance
- Partner with Dev teams on production readiness, Shift Left practices, and error budget management
View Full Description & ApplyYou'll be redirected to the employer's site