Sr Site Reliability Engineer

SigNozObservability Platform

IndiaFull-TimeSenior

Salary₹50L - ₹70L; ₹50L – ₹70L

Apply NowOpens the employer's application page

Job Details

5–8 years of experience in SRE, infrastructure, or platform/backend roles.
Deep, practical Kubernetes experience including resource tuning, autoscaling, networking, and stateful workloads.
Strong grasp of distributed systems failure modes, performance debugging, and capacity planning.
Proficiency in coding for automation, with Golang preferred.
Strong communication skills for writing technical runbooks and documentation.
Ability to work in a high-ownership, fast-moving, remote-first environment.
Interest in open source, with bonus points for prior contributions.
Working knowledge of ClickHouse (plus).
Familiarity with OpenTelemetry and high-throughput data ingest pipelines (plus).
Experience with Kafka (plus).
Background in Series B+ startup platform or infrastructure teams (plus).

Manage the reliability of the SigNoz cloud platform through SLOs/SLIs, error budgets, and incident response.
Scale the ingest path to ensure robustness during bursts while maintaining data freshness.
Perform SaaS auto-scalability and capacity planning across a petabyte-scale system.
Operate and tune ClickHouse and the data layer for performance and cost.
Maintain Kubernetes infrastructure, including cluster operations, upgrades, multi-tenancy, and automation.
Implement observability for SigNoz using our own product.
Manage infrastructure-as-code, CI/CD, and internal tooling.

View Full Description & ApplyYou'll be redirected to the employer's site