Sr Site Reliability Engineer

IndiaFull-TimeSenior

Salary5,000,000 - 10,000,000 INR per year

Apply NowOpens the employer's application page

Job Details

5–8 years of experience in SRE, infrastructure, platform engineering, or backend systems roles
Deep hands-on expertise with Kubernetes in production-scale environments
Strong understanding of distributed systems, failure modes, performance tuning, and capacity planning
Experience working with high-scale data systems (ClickHouse, Kafka, or similar)
Proficiency in at least one programming language (Go strongly preferred)
Familiarity with observability concepts and tools such as OpenTelemetry, metrics, logs, and traces
Strong problem-solving skills with the ability to debug complex production issues
Excellent communication skills with the ability to write clear documentation and runbooks
Experience in fast-paced, high-ownership, remote-first environments

Design, operate, and improve large-scale Kubernetes infrastructure including upgrades, scaling, networking, and multi-tenancy
Ensure system reliability through strong SRE practices including SLOs, SLIs, error budgets, incident response, and on-call optimization
Scale and maintain high-throughput ingestion pipelines handling petabyte-scale observability data
Operate, tune, and optimize data systems such as ClickHouse for performance, cost efficiency, and reliability
Build automation and tooling using infrastructure-as-code and CI/CD to improve deployment and operational efficiency
Monitor, debug, and resolve complex production issues across distributed systems
Improve observability of the platform itself using modern monitoring, logging, and tracing practices

View Full Description & ApplyYou'll be redirected to the employer's site