Staff Infrastructure Engineer — Observability

New
Based in the United StatesFull-TimeStaff
SalaryCompetitive base salary range aligned with experience and location
Apply NowOpens the employer's application page

Job Details

Experience
8+ years
Required Skills
KubernetesGoGrafanaPrometheusTerraformAnsible

Requirements

  • 8+ years of experience in Infrastructure Engineering, Site Reliability Engineering, or similar roles.
  • Strong hands-on expertise with Prometheus, Grafana, Thanos/Mimir/Cortex, and OpenTelemetry.
  • Experience designing and operating cloud-native systems in AWS or GCP environments.
  • Proficiency with Kubernetes-based production environments (EKS, GKE, or equivalent).
  • Strong infrastructure-as-code skills using Terraform and Ansible.
  • Experience building scalable, high-throughput distributed systems with a focus on reliability and cost efficiency.
  • Strong programming experience in Go or similar languages (Python, Java), with willingness to work in Go.
  • Experience leading technical architecture, mentoring engineers, and collaborating across product and platform teams.
  • Familiarity with secure or regulated environments such as FedRAMP or government-compliant systems is highly valued.

Responsibilities

  • Own the design, architecture, and evolution of large-scale observability systems supporting distributed, cloud-native infrastructure.
  • Build and optimize telemetry platforms using tools such as Prometheus, Grafana, Thanos/Mimir/Cortex, and OpenTelemetry pipelines.
  • Architect scalable data ingestion, storage, and analysis systems for high-volume production environments.
  • Drive observability strategy across engineering teams, defining standards for monitoring, logging, and tracing.
  • Develop automation and self-service tooling to reduce operational overhead and improve engineering efficiency.
  • Lead reliability improvements across multi-cloud environments (AWS and GCP), balancing performance, cost, and resilience.
  • Own incident response, root-cause analysis, and continuous improvement of production observability systems.
  • Mentor engineers, lead technical design reviews, and elevate engineering best practices across teams.
View Full Description & ApplyYou'll be redirected to the employer's site
Competitive base salary range aligned with experience and location
Apply Now