Staff Infrastructure Engineer — Observability
New
Based in the United StatesFull-TimeStaff
SalaryCompetitive base salary range aligned with experience and location
Apply NowOpens the employer's application page
Job Details
- Experience
- 8+ years
- Required Skills
- KubernetesGoGrafanaPrometheusTerraformAnsible
Requirements
- 8+ years of experience in Infrastructure Engineering, Site Reliability Engineering, or similar roles.
- Strong hands-on expertise with Prometheus, Grafana, Thanos/Mimir/Cortex, and OpenTelemetry.
- Experience designing and operating cloud-native systems in AWS or GCP environments.
- Proficiency with Kubernetes-based production environments (EKS, GKE, or equivalent).
- Strong infrastructure-as-code skills using Terraform and Ansible.
- Experience building scalable, high-throughput distributed systems with a focus on reliability and cost efficiency.
- Strong programming experience in Go or similar languages (Python, Java), with willingness to work in Go.
- Experience leading technical architecture, mentoring engineers, and collaborating across product and platform teams.
- Familiarity with secure or regulated environments such as FedRAMP or government-compliant systems is highly valued.
Responsibilities
- Own the design, architecture, and evolution of large-scale observability systems supporting distributed, cloud-native infrastructure.
- Build and optimize telemetry platforms using tools such as Prometheus, Grafana, Thanos/Mimir/Cortex, and OpenTelemetry pipelines.
- Architect scalable data ingestion, storage, and analysis systems for high-volume production environments.
- Drive observability strategy across engineering teams, defining standards for monitoring, logging, and tracing.
- Develop automation and self-service tooling to reduce operational overhead and improve engineering efficiency.
- Lead reliability improvements across multi-cloud environments (AWS and GCP), balancing performance, cost, and resilience.
- Own incident response, root-cause analysis, and continuous improvement of production observability systems.
- Mentor engineers, lead technical design reviews, and elevate engineering best practices across teams.
View Full Description & ApplyYou'll be redirected to the employer's site