Senior Site Reliability Engineer
New
GermanyContractSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- Minimum 5 years of experience in Site Reliability Engineering, Platform Engineering, or DevOps roles.
- Required Skills
- AWSPythonGoGrafanaPrometheusCI/CDTerraformCloudFormation
Requirements
- Minimum 5 years of experience in Site Reliability Engineering, Platform Engineering, or DevOps roles.
- Strong hands-on expertise with AWS cloud services; experience with Azure or GCP is a plus.
- Proven experience with Infrastructure as Code tools such as Terraform or CloudFormation.
- Solid understanding of CI/CD pipelines and Git-based development workflows.
- Experience defining and operating SLOs, SLIs, SLAs, and error budgets.
- Strong experience with observability and monitoring tools including Prometheus, Grafana, ELK/EFK, OpenTelemetry, and distributed tracing solutions.
- Programming or scripting experience with Python, Go, Bash, or PowerShell.
- Good understanding of networking concepts including VPCs, VPNs, load balancers, and firewalls.
- Knowledge of cloud security best practices and compliance frameworks.
- Excellent troubleshooting, communication, and stakeholder management skills.
- Ability to work effectively within global and cross-functional engineering environments.
Responsibilities
- Lead reliability and operational excellence initiatives across critical production platforms and cloud services.
- Define, monitor, and improve SLOs, SLIs, SLAs, availability targets, and error budgets.
- Design and implement scalable, resilient, and secure cloud-native architectures.
- Automate infrastructure provisioning and deployment processes using Infrastructure as Code and CI/CD best practices.
- Build and maintain observability solutions including monitoring, logging, tracing, and alerting systems.
- Drive incident response, troubleshooting, root cause analysis, and postmortem processes.
- Improve operational maturity through automation, documentation, runbooks, and engineering best practices.
- Collaborate closely with global engineering, security, product, and vendor teams.
- Perform capacity planning, performance optimization, and system reliability analysis.
- Mentor engineers and support knowledge sharing initiatives across technical teams.
View Full Description & ApplyYou'll be redirected to the employer's site