5+ years running production workloads on AWS (or GCP/Azure) with infrastructure-as-code (Terraform/CDK/CloudFormation) Hands-on experience operating container orchestration (ECS, EKS, Kubernetes, Nomad, etc.) and designing blue/green or canary rollouts Depth in at least two of our core datastores (Postgres, MongoDB, Kafka) including backup/restore, upgrades, and performance tuning Fluency with CI/CD pipelines (we use Buildkite + GitHub Actions) and a knack for automating everything with shell, Python, or TypeScript Proven track record setting up monitoring/alerting in Datadog, Prometheus, or similar, with clear SLO/SLA ownership Strong grasp of linux networking, load balancing (Cloudflare/ELB), and CDN/edge-security concepts Excellent incident-management and root-cause analysis skills; able to write crisp RCAs and follow through on action items Passion for customer-centric thinking, rapid iteration, and continuous learning