Site Reliability Engineer - Observability

Posted about 2 months agoViewed

View full description

💎 Seniority level: Senior, Proven experience in Software Engineering, SRE, or a similar role

📍 Location: Finland, Sweden, Germany, Denmark, Estonia

🔍 Industry: Technology and Delivery Services

🗣️ Languages: English

⏳ Experience: Proven experience in Software Engineering, SRE, or a similar role

🪄 Skills: AWSDockerPythonElasticSearchGCPKubernetesAzureGoGrafanaPrometheusCI/CDTerraformAnsible

Requirements:

Proven experience in Software Engineering, SRE, or a similar role with a focus on observability and reliability.
Strong foundation in computer science principles and engineering fundamentals.
Proficient in Go or Python, with experience building automation tools and software.
Hands-on experience with observability tooling such as DataDog, Prometheus, Mimir, Elasticsearch, Grafana, and Jaeger.
Expertise in cloud platforms like AWS, GCP, or Azure, managing infrastructure using Kubernetes and Docker.
Deep knowledge of building and maintaining reliable, high-performance, and scalable distributed systems.
Solid understanding of SRE principles, incident response, and designing fault-tolerant architectures.

Responsibilities:

Be responsible for building and improving our observability platform and tooling, used by all Wolt engineers.
Contribute to initiatives focused on architecting, building, and maintaining our observability stack.
Champion observability best practices, guiding and supporting other Woltians.
Take ownership of key initiatives to improve the quality, efficiency, and reliability of our observability stack.
Participate in the on-call rotation to address incidents and outages.
Help standardize observability resources by building tools and documentation.

Apply

Related Jobs

Apply

🔥 Site Reliability Engineer - Observability

Posted about 1 month ago

📍 Finland, Sweden, Germany, Denmark, Estonia

🧭 Full-Time

🔧 Requirements

Proven experience in Software Engineering, SRE, or a similar role with a focus on observability, reliability, and scaling large systems.
Experience with OpenTelemetry, which is a key foundation for much of the infrastructure and tooling the team is converging on.
Strong foundation in computer science principles and engineering fundamentals.
Proficient in development, particularly in Go (preferred) or Python, with experience building automation tools and software for large-scale, distributed systems.
Hands-on experience with observability tooling such as DataDog, Prometheus, Mimir, Elasticsearch, Grafana, Jaeger, and tracing systems.
Expertise in cloud platforms like AWS, GCP, or Azure, with experience managing cloud infrastructure using Kubernetes and containers (Docker).
Deep knowledge of building and maintaining reliable, high-performance, and scalable distributed systems.
Solid understanding of SRE principles, incident response, and designing fault-tolerant architectures.
Experience with infrastructure-as-code tools like Terraform or Ansible for managing cloud environments.
Familiarity with CI/CD pipelines, automated testing, and continuous delivery practices.
Strong analytical and problem-solving skills, with experience troubleshooting complex distributed systems.
Excellent communication and collaboration skills, with the ability to work cross-functionally to enhance platform observability and reliability.
Experience working directly with development teams, with a willingness to dive into application code for observability-related topics.
Solid experience with Docker and Kubernetes, coupled with a strong foundation in Unix systems and networking concepts.

💡 Responsibilities

Be responsible for building and improving our observability platform and tooling, used by all Wolt engineers.
Contribute to initiatives focused on architecting, building, and maintaining our observability stack to efficiently handle increasing telemetry data with greater reliability.
Champion observability best practices, guiding and supporting other Woltians in this space.
Take ownership of key initiatives to improve the quality, efficiency, and reliability of our observability stack.
Apply expertise in SRE culture and practices to ensure observability has a meaningful impact on the business.
Participate in the on-call rotation to address incidents and outages, resolving reliability issues efficiently.
Help standardize observability resources by building tools and documentation that enhance productivity and developer experience.
Triage and resolve production issues within the observability scope.
Contribute to open-source efforts by sharing some of our internal tools with the broader community.

AWSDockerPythonElasticSearchGCPKubernetesAzureGoGrafanaPrometheusCI/CDTerraformAnsible

Posted about 1 month ago

Apply