Design, secure, and maintain cloud infrastructure for production SaaS and ML workloads across AWS and/or GCP Build and operate scalable, containerized applications using Kubernetes, Helm, and Docker Develop and manage infrastructure-as-code solutions using Terraform, Bash, and Python Work directly with customers and internal teams to meet security, compliance, and reliability requirements (SOC 2, HIPAA, GDPR) Improve observability, reliability, and on-call processes, including SLO/SLAs and incident response Automate CI/CD workflows with tools like GitHub Actions and Spacelift Contribute code (Python, Node.js) to product features and platform infrastructure Identify and act on cost-optimization opportunities across the tech stack