Senior Site Reliability Engineer, Infrastructure

New

United StatesFull-TimeSenior

Salary125,000 - 135,000 USD per year

Apply NowOpens the employer's application page

Job Details

Experience: 5+ years
Required Skills: GrafanaLinuxTerraform

Requirements

5+ years of experience in site reliability engineering, platform engineering, or infrastructure engineering in production environments.
Strong hands-on experience building observability systems, including metrics, logs, alerting, and monitoring pipelines.
Familiarity with tools such as Grafana, Loki, Mimir, or similar observability platforms.
Working knowledge of datacenter hardware telemetry protocols such as Redfish, IPMI, and/or SNMP.
Strong Linux systems knowledge and experience operating production-grade infrastructure.
Experience with infrastructure-as-code tools such as Terraform, Ansible, Chef, or equivalent technologies.
Proven ability to collaborate across technical and operational teams in complex environments.
Strong communication skills and ability to translate operational needs into engineering solutions.

Responsibilities

Design and build observability pipelines for datacenter and provisioning infrastructure, including telemetry ingestion from systems such as Redfish, IPMI, SNMP, and OpenTelemetry.
Own the full observability stack, from data collection through storage, processing, visualization, and alerting using tools such as Grafana, Loki, and Mimir.
Develop dashboards, metrics, and alerting systems that provide actionable insights for datacenter operations, networking, systems, and provisioning teams.
Define and enforce standards for telemetry collection, observability design, and infrastructure monitoring across global environments.
Partner with cross-functional engineering and operations teams to translate operational needs into measurable signals and reliable monitoring systems.
Drive infrastructure-as-code practices for observability systems to ensure scalability, consistency, and maintainability.
Continuously improve system reliability, visibility, and operational efficiency across large-scale infrastructure environments.

View Full Description & ApplyYou'll be redirected to the employer's site

Similar Jobs

Senior DevOps Engineer / Site Reliability Engineer

Stellar Cyber

United StatesFull-Time

165,000 - 215,000 USD per year

View Job

Senior Site Reliability Engineer

Wikimedia Foundation

Please note that we are currently able to hire in the following: US States: Arizona, California, Colorado, Connecticut, District of Columbia*, Florida, Georgia, Idaho, Illinois, Indiana, Iowa, Maryland, Massachusetts, Michigan, Minnesota, Missouri, New Jersey, New Mexico, New York, North Carolina, Ohio, Oklahoma, Oregon, Pennsylvania, Puerto Rico*, Rhode Island, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, Wisconsin and Wyoming (*US Territory or Federal District) Countries: Brazil, Canada, Colombia, Germany, Ghana, India, Indonesia, Italy, Kenya*, Mexico, Morocco, Netherlands, Poland, Singapore*, South Africa, Spain, Switzerland and the United Kingdom.Full-Time

116,633 - 181,243 USD per year

View Job

Senior Site Reliability Engineer

Wikimedia Foundation

Please note that we are currently able to hire in the following: US States: [list of states] Countries: Brazil, Canada, Colombia, Germany, Ghana, India, Indonesia, Italy, Kenya, Mexico, Morocco, Netherlands, Poland, Singapore, South Africa, Spain, Switzerland and the United Kingdom.Full-Time

113,082 - 175,725 USD per year

View Job