Software Engineer, Compute Infrastructure

New

United StatesFull-TimeSenior

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

7+ years of experience building and operating large-scale distributed systems or cloud infrastructure platforms
Deep expertise with Kubernetes or similar container orchestration systems in production environments
Strong programming skills in Go, Rust, or similar systems-level languages used for infrastructure development
Proven experience designing, debugging, and operating complex distributed systems at scale
Strong understanding of infrastructure tradeoffs involving performance, reliability, scalability, and cost efficiency
Experience executing high-risk infrastructure changes or upgrades with minimal downtime
Hands-on experience with observability, incident response, and production on-call responsibilities
Experience with virtualization technologies such as Firecracker, gVisor, or Kata Containers (preferred)
Familiarity with Linux internals, eBPF, kernel tuning, or low-level system optimization (preferred)
Experience improving container startup performance, resource isolation, or multi-tenant security (preferred)

Own and evolve core compute infrastructure across multiple cloud providers, regions, and data centers, ensuring scalability and reliability at global scale
Design and build platform capabilities that improve service performance, availability, deployment flexibility, and fault tolerance across distributed systems
Investigate and resolve complex infrastructure issues spanning Kubernetes clusters, control planes, data planes, and underlying kernel-level systems
Improve system efficiency and performance through profiling, benchmarking, experimentation, and continuous tuning of infrastructure components
Develop and maintain infrastructure automation, including cluster provisioning, configuration, testing, upgrades, and lifecycle management
Contribute to the design and implementation of orchestration systems, controllers, and scheduling logic using systems programming languages such as Go or Rust
Participate in on-call rotations and incident response, improving observability, reliability, and operational maturity of the platform
Collaborate with engineering teams across the organization to ensure a stable, secure, and predictable compute environment

View Full Description & ApplyYou'll be redirected to the employer's site