Staff Site Reliability Engineer - Storage

Fully remote, distributed team across FranceFull-TimeStaff

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Strong, hands-on experience operating distributed infrastructure and stateful systems at scale in production
Experience with Kafka (MSK)
Experience with Redis (ElastiCache)
Mastery of core reliability fundamentals: disaster recovery (DR) planning, incident management, observability, and capacity planning
Track record of treating infrastructure as a product
Experience building automation (IaC), tooling, or DBaaS-like solutions
High rigor and detail-oriented approach
Capability to independently navigate complex, evolving production environments and make safe decisions
Ability to act as a trusted partner and translate complex infrastructure constraints into clear guidance

Assess the resilience maturity of current Kafka and Redis stacks, identify key risks, and propose an improvement roadmap.
Deliver concrete improvements on disaster recovery (DR) readiness, safe upgrades, alerting, and capacity planning.
Act as an internal consultant for backend and product engineering teams, leading design reviews and providing guidance.
Respond to and lead high-severity incidents on critical stateful infrastructure, mitigating impact and communicating clearly.
Drive a platform engineering mindset by building automation, tooling, and APIs to improve developer experience.

View Full Description & ApplyYou'll be redirected to the employer's site