Principal Observability & Reliability Architect

New
Based in United StatesFull-TimePrincipal
SalaryCompetitive On-Target Earnings (OTE) package including base salary and performance-based incentives, determined by experience and location.
Apply NowOpens the employer's application page

Job Details

Experience
10+ years of experience in observability, platform operations, SRE, monitoring, APM, or related enterprise infrastructure domains, including 5+ years in architecture or technical leadership roles.
Required Skills
ServiceNowDatadog

Requirements

  • 10+ years of experience in observability, platform operations, SRE, monitoring, APM, or related enterprise infrastructure domains, including 5+ years in architecture or technical leadership roles.
  • Strong hands-on expertise designing and implementing observability solutions across metrics, logs, traces, telemetry pipelines, and distributed systems in cloud and hybrid environments.
  • Deep understanding of telemetry governance frameworks, including data normalization, enrichment, routing, retention strategies, access control, and cost optimization.
  • Proven ability to define enterprise standards for dashboards, alerts, service tagging, naming conventions, RBAC, and operational maturity models.
  • Strong SRE background with practical experience implementing SLIs, SLOs, error budgets, incident response processes, and production reliability practices.
  • Experience integrating observability platforms with ITSM and operational tools such as ServiceNow, PagerDuty, Jira Service Management, or similar ecosystems.
  • Consulting or professional services experience with strong client-facing communication, workshop facilitation, estimation, and cross-functional leadership skills.
  • Ability to translate complex technical challenges into clear, actionable architecture and delivery plans for both technical and executive audiences.
  • Experience with platforms such as Datadog, Dynatrace, Splunk, Grafana, New Relic, Prometheus, or OpenTelemetry is highly desirable.
  • Familiarity with telemetry pipeline tools such as Kafka, Fluent Bit, OpenTelemetry Collector, or similar technologies is a strong plus.
  • Experience building reusable consulting assets such as reference architectures, accelerators, and governance frameworks is preferred.

Responsibilities

  • Lead discovery sessions, architecture workshops, and solution design activities across observability, reliability, telemetry, and operational intelligence programs for enterprise clients.
  • Design end-to-end observability architectures covering monitoring, logging, metrics, tracing, event correlation, alerting, telemetry pipelines, and platform integrations across hybrid and multi-cloud environments.
  • Define and enforce enterprise standards for telemetry governance, including naming conventions, tagging, RBAC, data quality, retention, sampling, cost optimization, and service ownership models.
  • Guide modernization initiatives such as tool consolidation, dashboard and alert rationalization, migration from legacy monitoring systems, and implementation of scalable observability platforms.
  • Establish and mature SRE practices including SLIs, SLOs, error budgets, production readiness reviews, and incident response frameworks to improve operational reliability.
  • Design integration patterns across ITSM, CMDB, event management, automation, and incident response platforms to ensure seamless operational workflows.
  • Support pre-sales and pursuit activities by shaping solution strategy, validating scope, developing estimates, and creating client-facing technical narratives.
  • Act as a senior escalation point during delivery, providing architecture governance, risk mitigation guidance, and technical oversight across engagements.
  • Develop reusable assets including reference architectures, playbooks, governance models, and accelerators while mentoring architects, consultants, and delivery teams.
View Full Description & ApplyYou'll be redirected to the employer's site
Competitive On-Target Earnings (OTE) package including base salary and performance-based incentives, determined by experience and location.
Apply Now