Senior Site Reliability Engineer

M
MoniepointFintech
Remote, NigeriaFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
Minimum of 5 years of experience
Required Skills
PostgreSQLPythonGCPJavaKubernetesGoRustDistributed Systems

Requirements

  • Minimum of 5 years of experience in SRE or Backend Engineering.
  • Ability to write clean, performant, and tested code in Java, Go, Rust, or Python.
  • Deep understanding of distributed systems architecture and design patterns.
  • Strong command of microservices fundamentals and event-driven architectures.
  • Extensive experience with Google Cloud Platform (GCP) or similar cloud providers (AWS/Azure).
  • Proficient in running production workloads on Kubernetes (GKE/EKS) and troubleshooting cluster/infrastructure issues.
  • Experience designing observability strategies using OpenTelemetry, Prometheus, New Relic, Datadog, or SigNoz.
  • Familiarity with operating and tuning production data stores (e.g., PostgreSQL, MySQL).
  • Familiarity with streaming platforms (e.g., Kafka, RabbitMQ) in a high-throughput environment.

Responsibilities

  • Participate in on-call rotations as the primary technical lead.
  • Act as the Incident Commander during major severity incidents: initiating war rooms, coordinating cross-functional teams, and providing clear status updates.
  • Instrument code to expose high-cardinality metrics and distributed traces.
  • Collaboratively define, measure, and defend Service Level Objectives (SLOs) and Error Budgets with product owners.
  • Write high-quality, production-ready code (in Java, Go, or Python) to build internal tooling, automation platforms, and self-healing mechanisms.
  • Partner with Product Engineering teams during the design phase to ensure new services are built with reliability, scalability, and observability patterns.
  • Analyze system performance and traffic patterns to model future capacity needs.
  • Conduct load testing and chaos engineering experiments to verify system resilience under failure conditions.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now