Site Reliability Engineer

New
C
CodesphereCloud Infrastructure
Remote (Germany)Full-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Required Skills
KubernetesGoCI/CDTerraformAnsible

Requirements

  • Proven experience in an SRE, DevOps, or platform engineering role with hands-on production ownership
  • Strong knowledge of Kubernetes, Terraform, and Ansible
  • Familiarity with Ceph or comparable distributed storage systems
  • Experience with SLOs, SLIs, error budgets, and CI/CD pipeline design
  • Degree in a relevant field or comparable qualification
  • Calm, structured, and fast under pressure with strong debugging and incident response skills
  • Good communicator, able to translate operational concerns into guidance for Dev teams
  • Go development experience is a plus

Responsibilities

  • Define and enforce SLOs, SLIs, and SLAs across production
  • Monitor system health, plan capacity, and automate deployments, patching, and infrastructure provisioning
  • Diagnose and resolve production incidents fast – including 24/7 on-call participation
  • Lead post-mortems and turn findings into prevention; maintain runbooks and escalation procedures
  • Manage cloud infrastructure via IaC and own CI/CD pipeline design and maintenance
  • Drive scalability, fault tolerance, disaster recovery, and security compliance
  • Partner with Dev teams on production readiness, Shift Left practices, and error budget management
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now