- Generate monitoring/observability recommendations through the analysis of HPIs/CPIs.
- Utilize skills in enterprise-level triage and incident resolution.
- Use modern system monitoring tools to improve enterprise reliability.
- Work with system and application owners to diagnose outages and recommend reliability improvements.
- Use hardware and software experience to strengthen VA systems.
- Partner with application owners to understand platform designs and operations.
- Collaborate with developers and identity/access teams for deep investigations.
- Utilize tools like SolarWinds, Dynatrace, and Splunk for reliability concerns.