Apply📍 Finland, Sweden, Germany, Denmark, Estonia
🧭 Full-Time
- Proven experience in Software Engineering, SRE, or a similar role with a focus on observability, reliability, and scaling large systems.
- Experience with OpenTelemetry, which is a key foundation for much of the infrastructure and tooling the team is converging on.
- Strong foundation in computer science principles and engineering fundamentals.
- Proficient in development, particularly in Go (preferred) or Python, with experience building automation tools and software for large-scale, distributed systems.
- Hands-on experience with observability tooling such as DataDog, Prometheus, Mimir, Elasticsearch, Grafana, Jaeger, and tracing systems.
- Expertise in cloud platforms like AWS, GCP, or Azure, with experience managing cloud infrastructure using Kubernetes and containers (Docker).
- Deep knowledge of building and maintaining reliable, high-performance, and scalable distributed systems.
- Solid understanding of SRE principles, incident response, and designing fault-tolerant architectures.
- Experience with infrastructure-as-code tools like Terraform or Ansible for managing cloud environments.
- Familiarity with CI/CD pipelines, automated testing, and continuous delivery practices.
- Strong analytical and problem-solving skills, with experience troubleshooting complex distributed systems.
- Excellent communication and collaboration skills, with the ability to work cross-functionally to enhance platform observability and reliability.
- Experience working directly with development teams, with a willingness to dive into application code for observability-related topics.
- Solid experience with Docker and Kubernetes, coupled with a strong foundation in Unix systems and networking concepts.
- Be responsible for building and improving our observability platform and tooling, used by all Wolt engineers.
- Contribute to initiatives focused on architecting, building, and maintaining our observability stack to efficiently handle increasing telemetry data with greater reliability.
- Champion observability best practices, guiding and supporting other Woltians in this space.
- Take ownership of key initiatives to improve the quality, efficiency, and reliability of our observability stack.
- Apply expertise in SRE culture and practices to ensure observability has a meaningful impact on the business.
- Participate in the on-call rotation to address incidents and outages, resolving reliability issues efficiently.
- Help standardize observability resources by building tools and documentation that enhance productivity and developer experience.
- Triage and resolve production issues within the observability scope.
- Contribute to open-source efforts by sharing some of our internal tools with the broader community.
AWSDockerPythonElasticSearchGCPKubernetesAzureGoGrafanaPrometheusCI/CDTerraformAnsible
Posted about 1 month ago
Apply