Staff Engineer - DevOps / Observability Engineer
New
United StatesFull-TimeStaff
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Required Skills
- AWSCI/CDLinuxDevOps
Requirements
- Strong hands-on experience with New Relic, including APM, telemetry collection, dashboard creation, alerting, and observability optimization.
- Proven expertise managing PagerDuty configurations, on-call schedules, escalation policies, alert routing, and incident response workflows.
- Solid experience in DevOps, Site Reliability Engineering (SRE), or production operations environments.
- Strong knowledge of cloud platforms, particularly AWS, with familiarity in infrastructure monitoring and cloud-native operations.
- Experience working with CI/CD pipelines, automation tools, Linux systems, and scripting technologies.
- Strong troubleshooting and problem-solving skills with the ability to investigate complex production issues.
- Understanding of logging, metrics, distributed tracing, and modern observability practices.
- Excellent collaboration and communication skills with the ability to work effectively across cross-functional teams.
- Ability to balance operational priorities while driving continuous improvement initiatives.
Responsibilities
- Analyze, optimize, and maintain application observability frameworks, including telemetry, monitoring, and performance dashboards.
- Improve monitoring visibility by refining metrics, logs, traces, and alerting mechanisms across production environments.
- Review and optimize incident management workflows, escalation policies, and alert configurations to reduce noise and improve response efficiency.
- Identify outdated monitoring assets and implement improvements aligned with current operational and business requirements.
- Support production operations through proactive monitoring, troubleshooting, incident response, and root cause analysis activities.
- Collaborate with engineering, infrastructure, and support teams to enhance system reliability, availability, and operational health.
- Integrate monitoring and observability solutions into CI/CD pipelines and cloud infrastructure environments.
- Recommend and implement industry best practices for observability, automation, reliability engineering, and operational excellence.
View Full Description & ApplyYou'll be redirected to the employer's site