Sr Site Reliability Engineer
IndiaFull-TimeSenior
Salary5,000,000 - 10,000,000 INR per year
Apply NowOpens the employer's application page
Job Details
- Experience
- 5–8 years
- Required Skills
- KafkaKubernetesClickhouseGoCI/CDDistributed Systems
Requirements
- 5–8 years of experience in SRE, infrastructure, platform engineering, or backend systems roles
- Deep hands-on expertise with Kubernetes in production-scale environments
- Strong understanding of distributed systems, failure modes, performance tuning, and capacity planning
- Experience working with high-scale data systems (ClickHouse, Kafka, or similar)
- Proficiency in at least one programming language (Go strongly preferred)
- Familiarity with observability concepts and tools such as OpenTelemetry, metrics, logs, and traces
- Strong problem-solving skills with the ability to debug complex production issues
- Excellent communication skills with the ability to write clear documentation and runbooks
- Experience in fast-paced, high-ownership, remote-first environments
Responsibilities
- Design, operate, and improve large-scale Kubernetes infrastructure including upgrades, scaling, networking, and multi-tenancy
- Ensure system reliability through strong SRE practices including SLOs, SLIs, error budgets, incident response, and on-call optimization
- Scale and maintain high-throughput ingestion pipelines handling petabyte-scale observability data
- Operate, tune, and optimize data systems such as ClickHouse for performance, cost efficiency, and reliability
- Build automation and tooling using infrastructure-as-code and CI/CD to improve deployment and operational efficiency
- Monitor, debug, and resolve complex production issues across distributed systems
- Improve observability of the platform itself using modern monitoring, logging, and tracing practices
View Full Description & ApplyYou'll be redirected to the employer's site