Senior Site Reliability Engineer, Infrastructure

VultrCloud Infrastructure

Remote - United StatesFull-TimeSenior

Salary125,000 - 135,000 USD per year

Apply NowOpens the employer's application page

Job Details

Experience: 5+ years
Required Skills: GrafanaLinuxTerraformAnsible

Requirements

5+ years of experience in site reliability, platform, or infrastructure engineering in a production environment.
Hands-on experience building and operating observability pipelines including metrics, logs, and alerting using Grafana, Loki, Mimir, or equivalent tooling.
Working knowledge of datacenter hardware telemetry protocols including Redfish, IPMI, and/or SNMP.
Strong Linux fundamentals and operational experience in production infrastructure environments.
Demonstrated experience with infrastructure-as-code and configuration management tooling (Terraform, Ansible, Chef or similar).
Strong cross-functional communication skills and experience delivering tooling for operational stakeholder teams.

Responsibilities

Design and build the observability pipeline for datacenter infrastructure including CDUs, PDUs, bare metal servers, and provisioning workflows, collecting telemetry via Redfish, IPMI, SNMP, and OpenTelemetry.
Own the full stack from data collection through to visualization and alerting in Grafana, Loki, and Mimir.
Build dashboards and alerting that are actionable and meaningful for stakeholder teams including Datacenter Ops, SysAdmin, Network, and Provisioning.
Establish standards and patterns for how datacenter infrastructure telemetry is collected, stored, and visualized across Vultr's global footprint.
Partner closely with stakeholder teams to understand their operational needs and translate them into observable, measurable signals.
Drive infrastructure-as-code practices across the observability pipeline to ensure consistency, repeatability, and maintainability.

View Full Description & ApplyYou'll be redirected to the employer's site

About the Company

Vultr

View Company Profile

Similar Jobs

Senior Site Reliability Engineer

Wikimedia Foundation

Please note that we are currently able to hire in the following: US States: Arizona, California, Colorado, Connecticut, District of Columbia*, Florida, Georgia, Idaho, Illinois, Indiana, Iowa, Maryland, Massachusetts, Michigan, Minnesota, Missouri, New Jersey, New Mexico, New York, North Carolina, Ohio, Oklahoma, Oregon, Pennsylvania, Puerto Rico*, Rhode Island, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, Wisconsin and Wyoming (*US Territory or Federal District) Countries: Brazil, Canada, Colombia, Germany, Ghana, India, Indonesia, Italy, Kenya*, Mexico, Morocco, Netherlands, Poland, Singapore*, South Africa, Spain, Switzerland and the United Kingdom.Full-Time

116,633 - 181,243 USD per year

View Job

Senior Site Reliability Engineer

Wikimedia Foundation

Please note that we are currently able to hire in the following: US States: [list of states] Countries: Brazil, Canada, Colombia, Germany, Ghana, India, Indonesia, Italy, Kenya, Mexico, Morocco, Netherlands, Poland, Singapore, South Africa, Spain, Switzerland and the United Kingdom.Full-Time

113,082 - 175,725 USD per year

View Job

Senior Site Reliability Engineer

United StatesFull-Time

View Job