- Own production reliability outcomes for services (availability, scalability, cost-efficiency, and security) from early design through implementation and operations
- Architect, deploy, and secure AWS cloud infrastructure following best practices
- Lead incident response, RCAs and run post-incident reviews
- Build and maintain “everything as code” infrastructure using Terraform/CloudFormation
- Operate, scale, and evolve the container platform (Kubernetes/EKS/ECS)
- Design, implement, and continuously improve CI/CD pipelines to enable fast, safe, and repeatable releases
- Integrate security tools, practices, and controls into CI/CD and DevOps workflows
- Establish and mature observability practices (metrics, logs, traces, dashboards, alerting)
- Drive automation to eliminate operational toil
- Promote an AI-first engineering culture and mentor peers
AWSPythonBash+3 more