Sr. Production Engineer

New
Z
ZscalerCloud Infrastructure
Remote - California, USA; San Jose, California, USAFull-TimeSenior
Salary119,000 - 170,000 USD per year
Apply NowOpens the employer's application page

Job Details

Experience
3-5+ years
Required Skills
AWSPythonGCPC++GoGrafanaPrometheusLinux

Requirements

  • 3-5+ years of experience managing reliability, scalability, and availability for large-scale production services
  • Deep expertise in programming (e.g., Python, Go, or C/C++)
  • Strong background in networking protocols, Linux/RHEL systems, and distributed architecture
  • Experience in high-stakes incident management and participation in a 24/7 on-call rotation
  • Proficiency in leveraging ITIL frameworks and incident data to drive service maturity

Responsibilities

  • Implement highly available, scalable infrastructure across AWS, GCP, and bare-metal environments
  • Drive an automation-first culture by writing code (Python/Go) to eliminate manual toil and build self-healing systems
  • Implement and maintain sophisticated observability (Prometheus, Grafana, OpenTelemetry)
  • Define SLIs/SLOs and establish error budgets
  • Act as a lead Incident Commander (TDO on-call)
  • Develop response playbooks and conduct deep-dive post-incident analyses
  • Partner with Engineering and partner teams to conduct operability reviews
View Full Description & ApplyYou'll be redirected to the employer's site
119,000 - 170,000 USD per year
Apply Now