Contribute to the production engineering strategy for Upbound Cloud, ensuring high availability, scalability, and efficiency. Own reliability metrics — including uptime, latency, and error budgets — and champion service-level objectives (SLOs) across teams. Design and implement automation for provisioning, observability, and incident response. Collaborate with development teams to build reliability into the software lifecycle. Operate and improve multi-tenant Kubernetes-based systems, leveraging Crossplane. Drive incident management — leading blameless postmortems, root cause analyses, and systemic remediation efforts. Mentor engineers in production engineering practices.