Observability Specialist
New
D
DeelSaaS, HR platform
EMEAFull-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 5+ years
- Required Skills
- AWSKubernetesGrafanaPrometheusTerraformGitHub ActionsDatadogHelm
Requirements
- 5+ years of hands-on experience in monitoring / observability engineering within cloud-native environments
- Strong experience with AWS services
- 5+ years of hands-on experience working with Kubernetes
- Solid knowledge of Kubernetes monitoring, including metrics, logs, and traces for clusters and workloads, alerting, SLOs, SLIs, and dashboards
- Proven experience operating and maintaining self-hosted monitoring stacks (advantage: Prometheus, Grafana, Mimir, Loki, Tempo)
- Experience designing or improving observability architectures at scale
- Experience with DataDog (metrics, logs, APM, alerts, and cost monitoring)
- Strong understanding of high availability, scalability, and fault-tolerant architectures
- Experience with monitoring cost optimization, including log and trace sampling strategies, storage and retention optimization
- Ability to automate monitoring tasks using Infrastructure as Code and scripting (Terraform, Helm, etc.)
- Familiarity with CI/CD pipelines and integrating monitoring into deployment workflows (GitHub Actions is an advantage)
- Experience with capacity planning and performance tuning
Responsibilities
- Design, implement, and maintain scalable observability solutions for cloud-native environments
- Own monitoring across AWS and Kubernetes (EKS) environments, covering clusters and workloads
- Operate and maintain self-hosted monitoring stacks (e.g., Prometheus, Grafana, Mimir, Loki, Tempo)
- Manage and optimize DataDog (metrics, logs, APM, alerts, cost monitoring)
- Improve observability architecture to support high availability, scalability, and fault tolerance
- Implement monitoring cost optimization strategies (log/trace sampling, retention policies, storage optimization)
- Automate observability infrastructure using Infrastructure as Code (Terraform, Helm, etc.)
- Integrate monitoring and alerting into CI/CD pipelines (GitHub Actions is an advantage)
- Support capacity planning and performance tuning initiatives
- Collaborate with DevOps, SRE, and Engineering teams to embed observability best practices
- Drive continuous improvement of monitoring standards, tooling, and reliability practices
View Full Description & ApplyYou'll be redirected to the employer's site