Staff Site Reliability Engineer - Storage

Fully remote, distributed team across FranceFull-TimeStaff
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Required Skills
AWSPostgreSQLKafkaKubernetesRedisTerraform

Requirements

  • Strong, hands-on experience operating distributed infrastructure and stateful systems at scale in production
  • Experience with Kafka (MSK)
  • Experience with Redis (ElastiCache)
  • Mastery of core reliability fundamentals: disaster recovery (DR) planning, incident management, observability, and capacity planning
  • Track record of treating infrastructure as a product
  • Experience building automation (IaC), tooling, or DBaaS-like solutions
  • High rigor and detail-oriented approach
  • Capability to independently navigate complex, evolving production environments and make safe decisions
  • Ability to act as a trusted partner and translate complex infrastructure constraints into clear guidance

Responsibilities

  • Assess the resilience maturity of current Kafka and Redis stacks, identify key risks, and propose an improvement roadmap.
  • Deliver concrete improvements on disaster recovery (DR) readiness, safe upgrades, alerting, and capacity planning.
  • Act as an internal consultant for backend and product engineering teams, leading design reviews and providing guidance.
  • Respond to and lead high-severity incidents on critical stateful infrastructure, mitigating impact and communicating clearly.
  • Drive a platform engineering mindset by building automation, tooling, and APIs to improve developer experience.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now