Senior Site Reliability Engineer – Compute Platforms

New
This role is fully remote for candidates who reside outside the 50 mile radius of our San Ramon office.Full-TimeSenior
Salary82,300 - 228,800 USD per year
Apply NowOpens the employer's application page

Job Details

Experience
6+ years
Required Skills
PythonBashKubernetesTerraformAnsible

Requirements

  • 6+ years of experience in infrastructure engineering, platform engineering, or DevOps
  • Proven experience designing and automating bare metal compute environments at scale
  • Strong hands-on experience with PXE boot, network-based OS provisioning, and automated server imaging
  • Experience implementing or supporting Bare Metal as a Service (BMaaS) platforms
  • Practical experience using Redfish APIs
  • Deep expertise with Ubuntu Linux in enterprise environments
  • Strong Hands-on experience with KVM hypervisors (Suse Harvester, OpenStack)
  • Experience designing and deploying production-grade Kubernetes clusters
  • Strong background with enterprise compute hardware platforms (Cisco UCS, Dell PowerEdge, Supermicro, HPE)
  • Proficiency with Infrastructure as Code tools (Terraform, Ansible)
  • Experience building or supporting CI/CD pipelines
  • Strong scripting skills in Python, Bash
  • Bachelor’s degree in computer science or equivalent professional experience

Responsibilities

  • Lead the architecture and design of enterprise compute and hypervisor platform solutions
  • Define standards and automation frameworks for bare metal provisioning and lifecycle management
  • Design and implement Bare Metal as a Service (BMaaS) capabilities
  • Architect and design Kubernetes platforms on bare metal
  • Architect and validate automated deployments of operating systems and hypervisors
  • Design and maintain PXE-based provisioning environments leveraging Redfish APIs
  • Develop Infrastructure-as-Code using Ansible, Terraform, Helm and Git
  • Implement CI/CD pipelines for infrastructure updates
  • Design automated workflows for server build and firmware lifecycle management
  • Perform deep troubleshooting across storage, Kubernetes, hypervisors, networking, and Linux
View Full Description & ApplyYou'll be redirected to the employer's site
82,300 - 228,800 USD per year
Apply Now