- Design and implement highly available
- Scalable infrastructure across AWS
- Azure
- GCP
- Bare-metal environments
- Drive an 'automation-first' culture by writing code (Python/Go) to eliminate manual toil and build self-healing systems
- Implement and maintain observability
- Define SLIs/SLOs
- Establish error budgets
- Act as a lead Incident Commander
- Develop response playbooks
- Conduct deep-dive post-incident analyses.
AWSPythonGCP+5 more