Senior AIOps Engineer, Incident Response
New
United States, core collaboration hours (9AM–2PM PT)Full-TimeSenior
Salary215,000 - 280,000 USD per year
Apply NowOpens the employer's application page
Job Details
- Experience
- 6–8 years
- Required Skills
- AWSDevOpsDistributed Systems
Requirements
- 6–8 years of experience in production operations, SRE, or technical support engineering roles
- Strong expertise in incident management, root cause analysis, and production troubleshooting
- Experience working in DevOps, SDLC, and change management environments
- Familiarity with tools such as Jira, Confluence, and modern observability platforms
- Strong analytical mindset with ability to identify trends and operational inefficiencies
- Excellent communication skills for cross-functional collaboration with engineering and leadership teams
- Bachelor’s degree in Computer Science, Engineering, or equivalent experience
- Experience with cloud platforms such as AWS and distributed system architectures (bonus)
- Exposure to AI/LLM systems, automation frameworks, or intelligent agents (strong advantage)
Responsibilities
- Own production health monitoring, reliability processes, and operational support across critical services
- Lead incident response, stakeholder communication, root cause analysis, and post-incident reviews
- Identify recurring production issues and implement long-term fixes to reduce operational toil
- Design and deploy AI-driven agents and automation workflows to streamline operational tasks
- Collaborate with engineering, product, and AI orchestration teams to improve system resilience
- Develop and maintain runbooks, operational documentation, and knowledge bases for human and AI use
- Support observability, monitoring, and troubleshooting across distributed cloud environments
- Participate in on-call rotations and continuously improve incident response readiness
View Full Description & ApplyYou'll be redirected to the employer's site