Staff Site Reliability Engineer - Storage
Fully remote, distributed team across FranceFull-TimeStaff
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Required Skills
- AWSPostgreSQLKafkaKubernetesRedisTerraform
Requirements
- Strong, hands-on experience operating distributed infrastructure and stateful systems at scale in production
- Experience with Kafka (MSK)
- Experience with Redis (ElastiCache)
- Mastery of core reliability fundamentals: disaster recovery (DR) planning, incident management, observability, and capacity planning
- Track record of treating infrastructure as a product
- Experience building automation (IaC), tooling, or DBaaS-like solutions
- High rigor and detail-oriented approach
- Capability to independently navigate complex, evolving production environments and make safe decisions
- Ability to act as a trusted partner and translate complex infrastructure constraints into clear guidance
Responsibilities
- Assess the resilience maturity of current Kafka and Redis stacks, identify key risks, and propose an improvement roadmap.
- Deliver concrete improvements on disaster recovery (DR) readiness, safe upgrades, alerting, and capacity planning.
- Act as an internal consultant for backend and product engineering teams, leading design reviews and providing guidance.
- Respond to and lead high-severity incidents on critical stateful infrastructure, mitigating impact and communicating clearly.
- Drive a platform engineering mindset by building automation, tooling, and APIs to improve developer experience.
View Full Description & ApplyYou'll be redirected to the employer's site