- Drive the stability and reliability of Epic's GCP infrastructure—setting and tracking SLOs/SLIs, reducing toil, and engineering out recurring sources of instability
- Build and operate Epic's GCP infrastructure for high availability, scalability, and cost efficiency
- Manage and harden our Docker and GKE container platform
- Maintain and improve CI/CD pipelines
- Own and evolve the observability stack—metrics, logs, traces, dashboards, and alerts
- Write and maintain Terraform to codify infrastructure
- Champion platform security best practices
- Support compliance-aware infrastructure practices—vulnerability management, access reviews, audit-evidence flows, and incident-response readiness
- Lead by example in a frequent on-call rotation; drive incident response, blameless post-mortems
DockerPythonBash+4 more