Senior Site Reliability Engineer – Compute Platforms
New
This role is fully remote for candidates who reside outside the 50 mile radius of our San Ramon office.Full-TimeSenior
Salary82,300 - 228,800 USD per year
Apply NowOpens the employer's application page
Job Details
- Experience
- 6+ years
- Required Skills
- PythonBashKubernetesTerraformAnsible
Requirements
- 6+ years of experience in infrastructure engineering, platform engineering, or DevOps
- Proven experience designing and automating bare metal compute environments at scale
- Strong hands-on experience with PXE boot, network-based OS provisioning, and automated server imaging
- Experience implementing or supporting Bare Metal as a Service (BMaaS) platforms
- Practical experience using Redfish APIs
- Deep expertise with Ubuntu Linux in enterprise environments
- Strong Hands-on experience with KVM hypervisors (Suse Harvester, OpenStack)
- Experience designing and deploying production-grade Kubernetes clusters
- Strong background with enterprise compute hardware platforms (Cisco UCS, Dell PowerEdge, Supermicro, HPE)
- Proficiency with Infrastructure as Code tools (Terraform, Ansible)
- Experience building or supporting CI/CD pipelines
- Strong scripting skills in Python, Bash
- Bachelor’s degree in computer science or equivalent professional experience
Responsibilities
- Lead the architecture and design of enterprise compute and hypervisor platform solutions
- Define standards and automation frameworks for bare metal provisioning and lifecycle management
- Design and implement Bare Metal as a Service (BMaaS) capabilities
- Architect and design Kubernetes platforms on bare metal
- Architect and validate automated deployments of operating systems and hypervisors
- Design and maintain PXE-based provisioning environments leveraging Redfish APIs
- Develop Infrastructure-as-Code using Ansible, Terraform, Helm and Git
- Implement CI/CD pipelines for infrastructure updates
- Design automated workflows for server build and firmware lifecycle management
- Perform deep troubleshooting across storage, Kubernetes, hypervisors, networking, and Linux
View Full Description & ApplyYou'll be redirected to the employer's site