Staff Software Engineer: Reliability, Performance
F
FelderaDatabase Software
US, EU or IndiaFull-TimeStaff
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Required Skills
- DockerPythonKubernetesRustCI/CDLinuxGitHub ActionsDistributed Systems
Requirements
- Strong background in systems engineering, performance testing, or site reliability engineering
- Fluency in Python
- Fluency in Linux fundamentals
- Experience with distributed systems and database concepts (consistency, fault tolerance, transactions)
- Experience with CI/CD pipeline engineering
- Hands-on experience running large-scale and long-running workloads, preferably in a cloud-native environment
- Curiosity, rigor, and the ability to design experiments that simulate messy real-world conditions
- Rust experience is strongly valued
- Experience with GitHub Actions
- Experience with Docker
- Experience with Kubernetes
Responsibilities
- Design and run long-lived workloads that mimic production environments, including sustained load, skewed data distributions, and upgrade workflows
- Build metrics and dashboards to continuously measure throughput, latency, and resource efficiency, and use these insights to guide system improvements
- Run experiments involving node failures, crashes, network partitions, resource contention and rolling upgrades – validating correctness and resilience under stress
- Own and evolve our CI/CD pipelines to make them faster, more reliable, and more reflective of production conditions
- Ensure that every change is validated under meaningful workloads before it ships
- Work closely with our systems engineers to pinpoint bottlenecks, identify regressions, and improve reliability mechanisms
View Full Description & ApplyYou'll be redirected to the employer's site