Staff Site Reliability Engineer

New

Fully remote work environment across Europe. Listing location: Germany.Full-TimeStaff

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

8–10 years of experience in SRE, DevOps, or Infrastructure Engineering.
Strong software engineering skills with Python or Go.
Deep expertise in distributed systems architecture.
Extensive experience with Kubernetes, Docker, and container orchestration.
Experience designing observability ecosystems using Prometheus, Grafana, Datadog, or OpenTelemetry.
Strong background in incident management and root cause analysis.
Hands-on experience with Infrastructure as Code tools like Terraform or Pulumi.
Excellent communication skills.
Proven leadership and mentoring experience.

Design and implement comprehensive observability solutions.
Define, track, and improve Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
Lead high-severity incident response efforts and conduct blameless post-mortems.
Build and maintain infrastructure automation and Infrastructure as Code using Terraform or Pulumi.
Develop self-healing systems to reduce operational overhead.
Optimize large-scale Kubernetes and cloud-native deployments.
Investigate and resolve complex distributed systems issues.
Review architectural designs for reliability and scalability.
Mentor engineers and establish reliability-focused engineering standards.
Build internal tools and automation using Python or Go.

View Full Description & ApplyYou'll be redirected to the employer's site