- Define and maintain SLOs, SLIs, and error budgets, plus observability metrics, logs, traces, and alerts.
- Build repeatable, self-service infrastructure through infrastructure-as-code and CI/CD pipelines.
- Own rollouts end-to-end including progressive delivery, canaries, migrations, and rollbacks.
- Operate and performance-tune node fleets, validators, RPC, and indexing services.
- Lead incident response and on-call, conduct postmortems, and harden platforms.
- Partner with product teams to design and operate production-ready services.
DockerKubernetesTypeScript+5 more