- Act as the final technical escalation for P1/P2 incidents, lead technical bridges, and deliver post-incident remediation plans.
- Perform deep-dive diagnostics across network, server, storage, virtualization, cloud, and backup platforms.
- Manage problem resolution for chronic issues, including RCA production and preventative action tracking in ServiceNow.
- Engineer and execute complex, high-risk changes aligned with CAB policies.
- Design and refine monitoring thresholds and dashboards to improve observability and reduce alert noise.
- Develop and maintain operational automation scripts using PowerShell, Python, or Bash and platform APIs.
- Author and maintain Tier 1/2 SOPs, runbooks, and knowledge articles to raise operational quality.
- Participate in 24/7 on-call rotations and after-hours maintenance.