Experience operating production cloud services at scale: monitoring, alerting, incident response, post-mortems, and continuous improvement of service reliability Strong debugging skills across distributed systems, including experience with observability tools (Prometheus, Grafana, OpenTelemetry, distributed tracing) Experience building and operating controllers that interact with the Kubernetes API server Comfortable working directly with customers to understand, reproduce, and resolve complex technical issues Take responsibility and ownership for solving problems Demonstrate excellence in your work Empathy for customers Clear communication and effective collaboration