Develop and maintain core Python platform for request routing, AI workload orchestration, GPU server capacity management, and observability. Develop and maintain infrastructure layer using Terraform and cloud provider APIs for managing GPU workers. Own and operate underlying platform technologies (e.g., K8s, Prometheus, DataDog). Architect and implement solutions impacting service performance and availability. Collaborate with core engineering team to design and build new infrastructure systems. Contribute to the vision and foundation for future infrastructure development. Shape technical direction and infrastructure best practices.