Sr Site Reliability Engineer
S
SigNozObservability Platform
IndiaFull-TimeSenior
Salary₹50L - ₹70L; ₹50L – ₹70L
Apply NowOpens the employer's application page
Job Details
- Experience
- 5–8 years
- Required Skills
- KafkaKubernetesClickhouseGoCI/CDDistributed Systems
Requirements
- 5–8 years of experience in SRE, infrastructure, or platform/backend roles.
- Deep, practical Kubernetes experience including resource tuning, autoscaling, networking, and stateful workloads.
- Strong grasp of distributed systems failure modes, performance debugging, and capacity planning.
- Proficiency in coding for automation, with Golang preferred.
- Strong communication skills for writing technical runbooks and documentation.
- Ability to work in a high-ownership, fast-moving, remote-first environment.
- Interest in open source, with bonus points for prior contributions.
- Working knowledge of ClickHouse (plus).
- Familiarity with OpenTelemetry and high-throughput data ingest pipelines (plus).
- Experience with Kafka (plus).
- Background in Series B+ startup platform or infrastructure teams (plus).
Responsibilities
- Manage the reliability of the SigNoz cloud platform through SLOs/SLIs, error budgets, and incident response.
- Scale the ingest path to ensure robustness during bursts while maintaining data freshness.
- Perform SaaS auto-scalability and capacity planning across a petabyte-scale system.
- Operate and tune ClickHouse and the data layer for performance and cost.
- Maintain Kubernetes infrastructure, including cluster operations, upgrades, multi-tenancy, and automation.
- Implement observability for SigNoz using our own product.
- Manage infrastructure-as-code, CI/CD, and internal tooling.
View Full Description & ApplyYou'll be redirected to the employer's site