Staff Engineer - DevOps / Observability Engineer

New

United StatesFull-TimeStaff

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Strong hands-on experience with New Relic, including APM, telemetry collection, dashboard creation, alerting, and observability optimization.
Proven expertise managing PagerDuty configurations, on-call schedules, escalation policies, alert routing, and incident response workflows.
Solid experience in DevOps, Site Reliability Engineering (SRE), or production operations environments.
Strong knowledge of cloud platforms, particularly AWS, with familiarity in infrastructure monitoring and cloud-native operations.
Experience working with CI/CD pipelines, automation tools, Linux systems, and scripting technologies.
Strong troubleshooting and problem-solving skills with the ability to investigate complex production issues.
Understanding of logging, metrics, distributed tracing, and modern observability practices.
Excellent collaboration and communication skills with the ability to work effectively across cross-functional teams.
Ability to balance operational priorities while driving continuous improvement initiatives.

Analyze, optimize, and maintain application observability frameworks, including telemetry, monitoring, and performance dashboards.
Improve monitoring visibility by refining metrics, logs, traces, and alerting mechanisms across production environments.
Review and optimize incident management workflows, escalation policies, and alert configurations to reduce noise and improve response efficiency.
Identify outdated monitoring assets and implement improvements aligned with current operational and business requirements.
Support production operations through proactive monitoring, troubleshooting, incident response, and root cause analysis activities.
Collaborate with engineering, infrastructure, and support teams to enhance system reliability, availability, and operational health.
Integrate monitoring and observability solutions into CI/CD pipelines and cloud infrastructure environments.
Recommend and implement industry best practices for observability, automation, reliability engineering, and operational excellence.

View Full Description & ApplyYou'll be redirected to the employer's site