Senior Site Reliability Engineer

New

GermanyContractSenior

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Experience: Minimum 5 years of experience in Site Reliability Engineering, Platform Engineering, or DevOps roles.
Required Skills: AWSPythonGoGrafanaPrometheusCI/CDTerraformCloudFormation

Minimum 5 years of experience in Site Reliability Engineering, Platform Engineering, or DevOps roles.
Strong hands-on expertise with AWS cloud services; experience with Azure or GCP is a plus.
Proven experience with Infrastructure as Code tools such as Terraform or CloudFormation.
Solid understanding of CI/CD pipelines and Git-based development workflows.
Experience defining and operating SLOs, SLIs, SLAs, and error budgets.
Strong experience with observability and monitoring tools including Prometheus, Grafana, ELK/EFK, OpenTelemetry, and distributed tracing solutions.
Programming or scripting experience with Python, Go, Bash, or PowerShell.
Good understanding of networking concepts including VPCs, VPNs, load balancers, and firewalls.
Knowledge of cloud security best practices and compliance frameworks.
Excellent troubleshooting, communication, and stakeholder management skills.
Ability to work effectively within global and cross-functional engineering environments.

Lead reliability and operational excellence initiatives across critical production platforms and cloud services.
Define, monitor, and improve SLOs, SLIs, SLAs, availability targets, and error budgets.
Design and implement scalable, resilient, and secure cloud-native architectures.
Automate infrastructure provisioning and deployment processes using Infrastructure as Code and CI/CD best practices.
Build and maintain observability solutions including monitoring, logging, tracing, and alerting systems.
Drive incident response, troubleshooting, root cause analysis, and postmortem processes.
Improve operational maturity through automation, documentation, runbooks, and engineering best practices.
Collaborate closely with global engineering, security, product, and vendor teams.
Perform capacity planning, performance optimization, and system reliability analysis.
Mentor engineers and support knowledge sharing initiatives across technical teams.

View Full Description & ApplyYou'll be redirected to the employer's site