Senior Site Reliability Engineer, Infrastructure
V
VultrCloud Infrastructure
Remote - United StatesFull-TimeSenior
Salary125,000 - 135,000 USD per year
Apply NowOpens the employer's application page
Job Details
- Experience
- 5+ years
- Required Skills
- GrafanaLinuxTerraformAnsible
Requirements
- 5+ years of experience in site reliability, platform, or infrastructure engineering in a production environment.
- Hands-on experience building and operating observability pipelines including metrics, logs, and alerting using Grafana, Loki, Mimir, or equivalent tooling.
- Working knowledge of datacenter hardware telemetry protocols including Redfish, IPMI, and/or SNMP.
- Strong Linux fundamentals and operational experience in production infrastructure environments.
- Demonstrated experience with infrastructure-as-code and configuration management tooling (Terraform, Ansible, Chef or similar).
- Strong cross-functional communication skills and experience delivering tooling for operational stakeholder teams.
Responsibilities
- Design and build the observability pipeline for datacenter infrastructure including CDUs, PDUs, bare metal servers, and provisioning workflows, collecting telemetry via Redfish, IPMI, SNMP, and OpenTelemetry.
- Own the full stack from data collection through to visualization and alerting in Grafana, Loki, and Mimir.
- Build dashboards and alerting that are actionable and meaningful for stakeholder teams including Datacenter Ops, SysAdmin, Network, and Provisioning.
- Establish standards and patterns for how datacenter infrastructure telemetry is collected, stored, and visualized across Vultr's global footprint.
- Partner closely with stakeholder teams to understand their operational needs and translate them into observable, measurable signals.
- Drive infrastructure-as-code practices across the observability pipeline to ensure consistency, repeatability, and maintainability.
View Full Description & ApplyYou'll be redirected to the employer's site