Software Engineer, Compute Infrastructure
New
United StatesFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 7+ years
- Required Skills
- KubernetesGoRustDistributed Systems
Requirements
- 7+ years of experience building and operating large-scale distributed systems or cloud infrastructure platforms
- Deep expertise with Kubernetes or similar container orchestration systems in production environments
- Strong programming skills in Go, Rust, or similar systems-level languages used for infrastructure development
- Proven experience designing, debugging, and operating complex distributed systems at scale
- Strong understanding of infrastructure tradeoffs involving performance, reliability, scalability, and cost efficiency
- Experience executing high-risk infrastructure changes or upgrades with minimal downtime
- Hands-on experience with observability, incident response, and production on-call responsibilities
- Experience with virtualization technologies such as Firecracker, gVisor, or Kata Containers (preferred)
- Familiarity with Linux internals, eBPF, kernel tuning, or low-level system optimization (preferred)
- Experience improving container startup performance, resource isolation, or multi-tenant security (preferred)
Responsibilities
- Own and evolve core compute infrastructure across multiple cloud providers, regions, and data centers, ensuring scalability and reliability at global scale
- Design and build platform capabilities that improve service performance, availability, deployment flexibility, and fault tolerance across distributed systems
- Investigate and resolve complex infrastructure issues spanning Kubernetes clusters, control planes, data planes, and underlying kernel-level systems
- Improve system efficiency and performance through profiling, benchmarking, experimentation, and continuous tuning of infrastructure components
- Develop and maintain infrastructure automation, including cluster provisioning, configuration, testing, upgrades, and lifecycle management
- Contribute to the design and implementation of orchestration systems, controllers, and scheduling logic using systems programming languages such as Go or Rust
- Participate in on-call rotations and incident response, improving observability, reliability, and operational maturity of the platform
- Collaborate with engineering teams across the organization to ensure a stable, secure, and predictable compute environment
View Full Description & ApplyYou'll be redirected to the employer's site