Senior Site Reliability Engineer

New
GermanyContractSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
Minimum 5 years of experience in Site Reliability Engineering, Platform Engineering, or DevOps roles.
Required Skills
AWSPythonGoGrafanaPrometheusCI/CDTerraformCloudFormation

Requirements

  • Minimum 5 years of experience in Site Reliability Engineering, Platform Engineering, or DevOps roles.
  • Strong hands-on expertise with AWS cloud services; experience with Azure or GCP is a plus.
  • Proven experience with Infrastructure as Code tools such as Terraform or CloudFormation.
  • Solid understanding of CI/CD pipelines and Git-based development workflows.
  • Experience defining and operating SLOs, SLIs, SLAs, and error budgets.
  • Strong experience with observability and monitoring tools including Prometheus, Grafana, ELK/EFK, OpenTelemetry, and distributed tracing solutions.
  • Programming or scripting experience with Python, Go, Bash, or PowerShell.
  • Good understanding of networking concepts including VPCs, VPNs, load balancers, and firewalls.
  • Knowledge of cloud security best practices and compliance frameworks.
  • Excellent troubleshooting, communication, and stakeholder management skills.
  • Ability to work effectively within global and cross-functional engineering environments.

Responsibilities

  • Lead reliability and operational excellence initiatives across critical production platforms and cloud services.
  • Define, monitor, and improve SLOs, SLIs, SLAs, availability targets, and error budgets.
  • Design and implement scalable, resilient, and secure cloud-native architectures.
  • Automate infrastructure provisioning and deployment processes using Infrastructure as Code and CI/CD best practices.
  • Build and maintain observability solutions including monitoring, logging, tracing, and alerting systems.
  • Drive incident response, troubleshooting, root cause analysis, and postmortem processes.
  • Improve operational maturity through automation, documentation, runbooks, and engineering best practices.
  • Collaborate closely with global engineering, security, product, and vendor teams.
  • Perform capacity planning, performance optimization, and system reliability analysis.
  • Mentor engineers and support knowledge sharing initiatives across technical teams.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now