Sr Site Reliability Engineer

S
SigNozObservability Platform
IndiaFull-TimeSenior
Salary₹50L - ₹70L; ₹50L – ₹70L
Apply NowOpens the employer's application page

Job Details

Experience
5–8 years
Required Skills
KafkaKubernetesClickhouseGoCI/CDDistributed Systems

Requirements

  • 5–8 years of experience in SRE, infrastructure, or platform/backend roles.
  • Deep, practical Kubernetes experience including resource tuning, autoscaling, networking, and stateful workloads.
  • Strong grasp of distributed systems failure modes, performance debugging, and capacity planning.
  • Proficiency in coding for automation, with Golang preferred.
  • Strong communication skills for writing technical runbooks and documentation.
  • Ability to work in a high-ownership, fast-moving, remote-first environment.
  • Interest in open source, with bonus points for prior contributions.
  • Working knowledge of ClickHouse (plus).
  • Familiarity with OpenTelemetry and high-throughput data ingest pipelines (plus).
  • Experience with Kafka (plus).
  • Background in Series B+ startup platform or infrastructure teams (plus).

Responsibilities

  • Manage the reliability of the SigNoz cloud platform through SLOs/SLIs, error budgets, and incident response.
  • Scale the ingest path to ensure robustness during bursts while maintaining data freshness.
  • Perform SaaS auto-scalability and capacity planning across a petabyte-scale system.
  • Operate and tune ClickHouse and the data layer for performance and cost.
  • Maintain Kubernetes infrastructure, including cluster operations, upgrades, multi-tenancy, and automation.
  • Implement observability for SigNoz using our own product.
  • Manage infrastructure-as-code, CI/CD, and internal tooling.
View Full Description & ApplyYou'll be redirected to the employer's site
₹50L - ₹70L; ₹50L – ₹70L
Apply Now