Define and enforce logging, tracing, and metrics standards across services Implement and maintain centralized telemetry pipelines and APM integrations Build reusable instrumentation libraries for core languages (Java, .NET, Node.js, Python) Establish dashboards and SLO/error budget alerts Ensure log/trace correlation and schema consistency Implement PII/secret redaction, retention, and cost optimization Collaborate with development teams to onboard services and ensure observability readiness Develop runbook templates, documentation, and training materials for engineering teams Audit alerts, reduce noise, and maintain alert quality standards Support incident response through tooling improvement and post-incident telemetry analysis