Principal Production Engineer

Z
ZscalerCybersecurity
Remote - California, USA; San Jose, California, USAFull-TimePrincipal
Salary164,500 - 235,000 USD per year
Apply NowOpens the employer's application page

Job Details

Experience
10+ years
Required Skills
AWSPythonGCPKubernetesGoGrafanaPrometheusLinuxTerraform

Requirements

  • 10+ years of experience managing reliability, scalability, and availability for large-scale production services
  • Deep expertise in programming (e.g., Python, Go, or C/C++)
  • Strong background in networking protocols, Linux/RHEL systems, and distributed architecture
  • Experience in high-stakes incident management and participation in a 24/7 on-call rotation
  • Proficiency in leveraging ITIL frameworks and incident data
  • Extensive experience with public cloud (AWS, Azure, GCP) and Infrastructure-as-Code (Ansible, Terraform, Helm, Temporal)
  • Expertise in global routing (BGP), traffic tunneling (GRE, IPSec), L7 proxy (HAProxy), and DNS at scale

Responsibilities

  • Design and implement highly available, scalable infrastructure across AWS, GCP, and bare-metal environments
  • Drive an automation-first culture by writing code to eliminate manual toil and build self-healing systems
  • Implement and maintain sophisticated observability (Prometheus, Grafana, OpenTelemetry)
  • Define SLIs/SLOs and establish error budgets
  • Act as a lead Incident Commander (TDO on-call), develop response playbooks, and conduct post-incident analyses
  • Partner with Engineering and partner teams to conduct operability reviews
View Full Description & ApplyYou'll be redirected to the employer's site
164,500 - 235,000 USD per year
Apply Now