Software Engineer, Compute Infrastructure

New
United StatesFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
7+ years
Required Skills
KubernetesGoRustDistributed Systems

Requirements

  • 7+ years of experience building and operating large-scale distributed systems or cloud infrastructure platforms
  • Deep expertise with Kubernetes or similar container orchestration systems in production environments
  • Strong programming skills in Go, Rust, or similar systems-level languages used for infrastructure development
  • Proven experience designing, debugging, and operating complex distributed systems at scale
  • Strong understanding of infrastructure tradeoffs involving performance, reliability, scalability, and cost efficiency
  • Experience executing high-risk infrastructure changes or upgrades with minimal downtime
  • Hands-on experience with observability, incident response, and production on-call responsibilities
  • Experience with virtualization technologies such as Firecracker, gVisor, or Kata Containers (preferred)
  • Familiarity with Linux internals, eBPF, kernel tuning, or low-level system optimization (preferred)
  • Experience improving container startup performance, resource isolation, or multi-tenant security (preferred)

Responsibilities

  • Own and evolve core compute infrastructure across multiple cloud providers, regions, and data centers, ensuring scalability and reliability at global scale
  • Design and build platform capabilities that improve service performance, availability, deployment flexibility, and fault tolerance across distributed systems
  • Investigate and resolve complex infrastructure issues spanning Kubernetes clusters, control planes, data planes, and underlying kernel-level systems
  • Improve system efficiency and performance through profiling, benchmarking, experimentation, and continuous tuning of infrastructure components
  • Develop and maintain infrastructure automation, including cluster provisioning, configuration, testing, upgrades, and lifecycle management
  • Contribute to the design and implementation of orchestration systems, controllers, and scheduling logic using systems programming languages such as Go or Rust
  • Participate in on-call rotations and incident response, improving observability, reliability, and operational maturity of the platform
  • Collaborate with engineering teams across the organization to ensure a stable, secure, and predictable compute environment
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now