Staff Software Engineer: Reliability, Performance

F
FelderaDatabase Software
US, EU or IndiaFull-TimeStaff
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Required Skills
DockerPythonKubernetesRustCI/CDLinuxGitHub ActionsDistributed Systems

Requirements

  • Strong background in systems engineering, performance testing, or site reliability engineering
  • Fluency in Python
  • Fluency in Linux fundamentals
  • Experience with distributed systems and database concepts (consistency, fault tolerance, transactions)
  • Experience with CI/CD pipeline engineering
  • Hands-on experience running large-scale and long-running workloads, preferably in a cloud-native environment
  • Curiosity, rigor, and the ability to design experiments that simulate messy real-world conditions
  • Rust experience is strongly valued
  • Experience with GitHub Actions
  • Experience with Docker
  • Experience with Kubernetes

Responsibilities

  • Design and run long-lived workloads that mimic production environments, including sustained load, skewed data distributions, and upgrade workflows
  • Build metrics and dashboards to continuously measure throughput, latency, and resource efficiency, and use these insights to guide system improvements
  • Run experiments involving node failures, crashes, network partitions, resource contention and rolling upgrades – validating correctness and resilience under stress
  • Own and evolve our CI/CD pipelines to make them faster, more reliable, and more reflective of production conditions
  • Ensure that every change is validated under meaningful workloads before it ships
  • Work closely with our systems engineers to pinpoint bottlenecks, identify regressions, and improve reliability mechanisms
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now