Sr. Staff Production Engineer

New

ZscalerCybersecurity

California, USAFull-TimeStaff

Salary140000 - 200000 USD per year

Apply NowOpens the employer's application page

Job Details

8+ years of experience managing reliability, scalability, and availability for large-scale production services
Deep expertise in programming (e.g., Python, Go, or C/C++)
Strong background in networking protocols
Strong background in Linux/FreeBSD systems
Strong background in distributed architecture
Experience in high-stakes incident management
Participation in a 24/7 on-call rotation
Proficiency in leveraging ITIL frameworks
Proficiency in using incident data to drive service maturity through systematic problem management
Proficiency in using incident data to drive service maturity through technical operability reviews
Extensive experience with public cloud (AWS, Azure, GCP) (Preferred)
Experience with Infrastructure-as-Code (Ansible, Terraform) (Preferred)
Experience with chaos engineering and disaster recovery planning at scale (Preferred)
Expertise in global routing (BGP) (Preferred)
Expertise in traffic tunneling (GRE, IPSec) (Preferred)
Deep understanding of L7 proxy architectures (HAProxy) (Preferred)
Deep understanding of DNS at scale (Preferred)
Deep understanding of OS networking stack internals (Preferred)

Provide technical vision and hands-on execution to drive an "automation-first" culture
Mature observability and architectural standards to reduce Mean Time to Mitigate (MTTM)
Shape the scalability of globally distributed, multi-cloud infrastructure
Design and implement highly available, scalable infrastructure across AWS, Azure, GCP, and bare-metal environments
Write code (Python/Go) to eliminate manual toil and build self-healing systems
Implement and maintain sophisticated observability (Prometheus, Grafana, OpenTelemetry), define SLIs/SLOs, and establish error budgets
Act as a lead Incident Commander (TDO on-call), develop response playbooks, and conduct deep-dive post-incident analyses
Partner with Engineering and partner teams to conduct operability reviews

View Full Description & ApplyYou'll be redirected to the employer's site