Senior Site Reliability Engineer
New
M
MoniepointFinancial Technology
Remote, IndiaFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- Minimum of 5 years of experience in SRE or Backend Engineering
- Required Skills
- PythonGCPJavaKafkaKubernetesGoRustDistributed Systems
Requirements
- Minimum of 5 years of experience in SRE or Backend Engineering.
- Strong ability to write clean, performant, and tested code in Java, Go, Rust, or Python.
- Deep understanding of distributed systems architecture and design patterns.
- Proficiency in microservices fundamentals and event-driven architectures.
- Extensive experience with Google Cloud Platform (GCP) or similar cloud providers like AWS or Azure.
- Proficient in running production workloads on Kubernetes (GKE/EKS) and troubleshooting cluster/infrastructure issues.
- Experience designing observability strategies using OpenTelemetry, Prometheus, New Relic, Datadog, or SigNoz.
- Familiarity with operating and tuning production data stores like PostgreSQL or MySQL.
- Experience working with streaming platforms such as Kafka or RabbitMQ in high-throughput environments.
Responsibilities
- Participate in on-call rotations as the primary technical lead and act as Incident Commander during major severity incidents.
- Instrument code to expose high-cardinality metrics and distributed traces.
- Collaboratively define, measure, and defend Service Level Objectives (SLOs) and Error Budgets with product owners.
- Write high-quality, production-ready code in Java, Go, or Python to build internal tooling, automation, and self-healing mechanisms.
- Partner with Product Engineering teams during the design phase to implement reliability, scalability, and observability patterns.
- Analyze system performance and traffic patterns to model capacity needs.
- Conduct load testing and chaos engineering experiments to verify system resilience.
View Full Description & ApplyYou'll be redirected to the employer's site