Observability Engineer

New
United StatesFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
5+ years
Required Skills
PythonJavaKubernetesGoGrafanaPrometheusLinuxDatadog

Requirements

  • 5+ years of experience in SRE, platform engineering, or observability-focused roles.
  • Strong expertise with Prometheus, Grafana, and at least one commercial platform (Datadog, New Relic, or Splunk).
  • Deep understanding of OpenTelemetry, distributed tracing, and structured logging.
  • Strong programming skills in Go, Python, or Java.
  • Solid knowledge of SRE principles including SLOs and error budgets.
  • Experience operating Kubernetes or container-based environments.
  • Strong Linux, networking, and distributed systems fundamentals.
  • Strong communication skills.

Responsibilities

  • Design and operate large-scale observability platforms covering metrics, logs, traces, and synthetic monitoring.
  • Define and enforce observability standards including instrumentation practices and structured logging.
  • Build and maintain SLO/SLI frameworks, error budgets, and alerting systems.
  • Manage high-volume time-series and log storage systems for performance and cost efficiency.
  • Develop self-service tooling, dashboards, and reusable templates.
  • Improve incident response workflows through better alerting, runbooks, and analysis.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now