Observability Specialist

New
D
DeelSaaS, HR platform
EMEAFull-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
5+ years
Required Skills
AWSKubernetesGrafanaPrometheusTerraformGitHub ActionsDatadogHelm

Requirements

  • 5+ years of hands-on experience in monitoring / observability engineering within cloud-native environments
  • Strong experience with AWS services
  • 5+ years of hands-on experience working with Kubernetes
  • Solid knowledge of Kubernetes monitoring, including metrics, logs, and traces for clusters and workloads, alerting, SLOs, SLIs, and dashboards
  • Proven experience operating and maintaining self-hosted monitoring stacks (advantage: Prometheus, Grafana, Mimir, Loki, Tempo)
  • Experience designing or improving observability architectures at scale
  • Experience with DataDog (metrics, logs, APM, alerts, and cost monitoring)
  • Strong understanding of high availability, scalability, and fault-tolerant architectures
  • Experience with monitoring cost optimization, including log and trace sampling strategies, storage and retention optimization
  • Ability to automate monitoring tasks using Infrastructure as Code and scripting (Terraform, Helm, etc.)
  • Familiarity with CI/CD pipelines and integrating monitoring into deployment workflows (GitHub Actions is an advantage)
  • Experience with capacity planning and performance tuning

Responsibilities

  • Design, implement, and maintain scalable observability solutions for cloud-native environments
  • Own monitoring across AWS and Kubernetes (EKS) environments, covering clusters and workloads
  • Operate and maintain self-hosted monitoring stacks (e.g., Prometheus, Grafana, Mimir, Loki, Tempo)
  • Manage and optimize DataDog (metrics, logs, APM, alerts, cost monitoring)
  • Improve observability architecture to support high availability, scalability, and fault tolerance
  • Implement monitoring cost optimization strategies (log/trace sampling, retention policies, storage optimization)
  • Automate observability infrastructure using Infrastructure as Code (Terraform, Helm, etc.)
  • Integrate monitoring and alerting into CI/CD pipelines (GitHub Actions is an advantage)
  • Support capacity planning and performance tuning initiatives
  • Collaborate with DevOps, SRE, and Engineering teams to embed observability best practices
  • Drive continuous improvement of monitoring standards, tooling, and reliability practices
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now