Incident and Escalation Manager

New
Based in the United StatesFull-TimeManager
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
12+ years
Required Skills
Artificial IntelligenceSalesforceJiraSlackDistributed Systems

Requirements

  • 12+ years of experience in Incident Management, Escalation Management, Problem Management, or Technical Operations in enterprise or high-tech environments.
  • Proven experience leading high-severity incidents and executive escalations in AI, HPC, or large-scale infrastructure ecosystems.
  • Strong technical understanding of complex distributed systems and ability to collaborate effectively with engineering teams under pressure.
  • Deep knowledge of ITIL frameworks, including Incident, Problem, Change, and Escalation Management practices.
  • Exceptional communication skills, with the ability to manage both technical and executive-level audiences.
  • Strong analytical mindset with experience interpreting incident data, trends, and operational metrics.
  • Ability to operate in high-pressure, customer-facing situations with strong ownership and decision-making capabilities.
  • Experience working in global, 24/7 operational environments with on-call responsibilities.
  • Proven ability to influence cross-functional teams and senior stakeholders without direct authority.

Responsibilities

  • Lead and coordinate major incident response efforts for high-severity service disruptions impacting AI, HPC, and enterprise-scale environments.
  • Act as Incident Commander, driving structured triage, cross-functional collaboration, real-time decision-making, and service restoration activities.
  • Manage executive-level escalations, ensuring rapid resolution of critical customer issues and maintaining strong stakeholder alignment.
  • Provide clear, timely, and structured communication to executives, customers, and internal teams during major incidents.
  • Partner with engineering, support, product, and sales teams to resolve complex technical and service-related challenges.
  • Lead post-incident and escalation reviews (PIER), including root cause analysis and corrective action tracking.
  • Identify systemic issues and drive continuous improvement across incident, escalation, and problem management processes.
  • Contribute to the development of operational frameworks, governance models, and service reliability standards across global teams.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now