Senior AIOps Engineer, Incident Response

New

United States, core collaboration hours (9AM–2PM PT)Full-TimeSenior

Salary215,000 - 280,000 USD per year

Apply NowOpens the employer's application page

Job Details

6–8 years of experience in production operations, SRE, or technical support engineering roles
Strong expertise in incident management, root cause analysis, and production troubleshooting
Experience working in DevOps, SDLC, and change management environments
Familiarity with tools such as Jira, Confluence, and modern observability platforms
Strong analytical mindset with ability to identify trends and operational inefficiencies
Excellent communication skills for cross-functional collaboration with engineering and leadership teams
Bachelor’s degree in Computer Science, Engineering, or equivalent experience
Experience with cloud platforms such as AWS and distributed system architectures (bonus)
Exposure to AI/LLM systems, automation frameworks, or intelligent agents (strong advantage)

Own production health monitoring, reliability processes, and operational support across critical services
Lead incident response, stakeholder communication, root cause analysis, and post-incident reviews
Identify recurring production issues and implement long-term fixes to reduce operational toil
Design and deploy AI-driven agents and automation workflows to streamline operational tasks
Collaborate with engineering, product, and AI orchestration teams to improve system resilience
Develop and maintain runbooks, operational documentation, and knowledge bases for human and AI use
Support observability, monitoring, and troubleshooting across distributed cloud environments
Participate in on-call rotations and continuously improve incident response readiness

View Full Description & ApplyYou'll be redirected to the employer's site