Staff Engineer - DevOps / Observability Engineer

New
United StatesFull-TimeStaff
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Required Skills
AWSCI/CDLinuxDevOps

Requirements

  • Strong hands-on experience with New Relic, including APM, telemetry collection, dashboard creation, alerting, and observability optimization.
  • Proven expertise managing PagerDuty configurations, on-call schedules, escalation policies, alert routing, and incident response workflows.
  • Solid experience in DevOps, Site Reliability Engineering (SRE), or production operations environments.
  • Strong knowledge of cloud platforms, particularly AWS, with familiarity in infrastructure monitoring and cloud-native operations.
  • Experience working with CI/CD pipelines, automation tools, Linux systems, and scripting technologies.
  • Strong troubleshooting and problem-solving skills with the ability to investigate complex production issues.
  • Understanding of logging, metrics, distributed tracing, and modern observability practices.
  • Excellent collaboration and communication skills with the ability to work effectively across cross-functional teams.
  • Ability to balance operational priorities while driving continuous improvement initiatives.

Responsibilities

  • Analyze, optimize, and maintain application observability frameworks, including telemetry, monitoring, and performance dashboards.
  • Improve monitoring visibility by refining metrics, logs, traces, and alerting mechanisms across production environments.
  • Review and optimize incident management workflows, escalation policies, and alert configurations to reduce noise and improve response efficiency.
  • Identify outdated monitoring assets and implement improvements aligned with current operational and business requirements.
  • Support production operations through proactive monitoring, troubleshooting, incident response, and root cause analysis activities.
  • Collaborate with engineering, infrastructure, and support teams to enhance system reliability, availability, and operational health.
  • Integrate monitoring and observability solutions into CI/CD pipelines and cloud infrastructure environments.
  • Recommend and implement industry best practices for observability, automation, reliability engineering, and operational excellence.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now