Apply📍 Finland, Sweden, Germany, Denmark, Estonia
🧭 Full-Time
🔍 Software Engineering
🏢 Company: Wolt - English
- Strong foundation in software engineering, with experience designing and building distributed systems.
- Proficiency in Go (preferred) or Python, with a focus on building automation and developer tooling.
- Experience with building and maintaining observability platforms at scale
- Hands-on experience in architecting and maintaining observability stack at scale based on open source tools such as Prometheus, Grafana, Elasticsearch, or similar.
- Have a solid understanding of modern observability frameworks, such as OpenTelemetry (OTeL), which forms the foundation of our next-generation observability platform and future strategy.
- Solid understanding of SRE principles, including incident response, fault-tolerant architecture, and service-level objectives (SLIs/SLOs).
- Be comfortable working in large-scale, distributed environments, with expertise in Kubernetes, container orchestration, and resolving issues in complex cloud-native systems.
- Familiarity with cloud platforms (AWS preferred, GCP, or Azure)
- Strong troubleshooting and problem-solving skills in complex systems.
- Excellent collaboration and communication skills.
- Design and develop scalable software solutions and tooling to improve observability and reliability across Wolt’s services, with a focus on empowering teams to monitor and debug effectively.
- Contribute to initiatives focused on architecting, building, and maintaining observability stack to efficiently handle increasing telemetry data with greater reliability.
- Take ownership of key initiatives to improve the quality, efficiency, and reliability of our observability stack.
- Contribute to and advocate for SRE principles to improve system availability, performance, and efficiency, ensuring that reliability is embedded across all layers of Wolt's services.
- Build and own tooling and frameworks that enable teams to improve reliability, optimize system performance, and manage incidents more effectively.
- Collaborate closely with engineering teams to implement observability best practices, integrate reliability tooling, and resolve complex production issues.
- Participating in on-call rotations, driving root cause analysis, and building automated detection and resolution tools to reduce mean time to recovery (MTTR) in purview of observability domain and systems.
- Document and share knowledge through guides, playbooks, and training sessions, while continuously improving the developer experience with self-service tooling and best practices.
AWSPythonCloud ComputingElasticSearchKubernetesGoGrafanaPrometheusREST APICI/CDMicroservicesSoftware EngineeringDebugging
Posted 14 days ago
Apply