Observability Engineer
New
United StatesFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 5+ years
- Required Skills
- PythonJavaKubernetesGoGrafanaPrometheusLinuxDatadog
Requirements
- 5+ years of experience in SRE, platform engineering, or observability-focused roles.
- Strong expertise with Prometheus, Grafana, and at least one commercial platform (Datadog, New Relic, or Splunk).
- Deep understanding of OpenTelemetry, distributed tracing, and structured logging.
- Strong programming skills in Go, Python, or Java.
- Solid knowledge of SRE principles including SLOs and error budgets.
- Experience operating Kubernetes or container-based environments.
- Strong Linux, networking, and distributed systems fundamentals.
- Strong communication skills.
Responsibilities
- Design and operate large-scale observability platforms covering metrics, logs, traces, and synthetic monitoring.
- Define and enforce observability standards including instrumentation practices and structured logging.
- Build and maintain SLO/SLI frameworks, error budgets, and alerting systems.
- Manage high-volume time-series and log storage systems for performance and cost efficiency.
- Develop self-service tooling, dashboards, and reusable templates.
- Improve incident response workflows through better alerting, runbooks, and analysis.
View Full Description & ApplyYou'll be redirected to the employer's site