Sr. Production Engineer
New
Z
ZscalerCloud Infrastructure
Remote - California, USA; San Jose, California, USAFull-TimeSenior
Salary119,000 - 170,000 USD per year
Apply NowOpens the employer's application page
Job Details
- Experience
- 3-5+ years
- Required Skills
- AWSPythonGCPC++GoGrafanaPrometheusLinux
Requirements
- 3-5+ years of experience managing reliability, scalability, and availability for large-scale production services
- Deep expertise in programming (e.g., Python, Go, or C/C++)
- Strong background in networking protocols, Linux/RHEL systems, and distributed architecture
- Experience in high-stakes incident management and participation in a 24/7 on-call rotation
- Proficiency in leveraging ITIL frameworks and incident data to drive service maturity
Responsibilities
- Implement highly available, scalable infrastructure across AWS, GCP, and bare-metal environments
- Drive an automation-first culture by writing code (Python/Go) to eliminate manual toil and build self-healing systems
- Implement and maintain sophisticated observability (Prometheus, Grafana, OpenTelemetry)
- Define SLIs/SLOs and establish error budgets
- Act as a lead Incident Commander (TDO on-call)
- Develop response playbooks and conduct deep-dive post-incident analyses
- Partner with Engineering and partner teams to conduct operability reviews
View Full Description & ApplyYou'll be redirected to the employer's site