Senior Technical Product Manager, GPU Orchestration

New

VultrCloud Infrastructure

Remote - United StatesFull-TimeSenior

Salary130,000 - 165,000 USD per year

Apply NowOpens the employer's application page

Job Details

7+ years of product management experience in cloud infrastructure, container orchestration, HPC, or developer platforms
Deep understanding of Kubernetes, Slurm, or similar orchestration and scheduling systems, including GPU scheduling, resource management, and multi-tenant isolation
Experience defining product strategy and roadmaps for platform or infrastructure products at scale
Strong technical background — ability to engage with engineering on cluster lifecycle, control plane reliability, API design, and distributed systems
Experience with AI/ML infrastructure, including training workloads, inference serving, and GPU resource optimization
Track record of shipping developer- and operator-facing products with measurable impact on reliability, adoption, or operational efficiency
Experience working across cross-functional teams (engineering, design, marketing, sales) in a fast-paced environment
Excellent written and verbal communication skills
Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent experience)

Define and execute the roadmap for managed Kubernetes, managed Slurm services, SUNK, and Run:ai integration
Own the end-to-end cluster lifecycle, including provisioning, configuration, upgrades, scaling, high availability, and decommissioning
Establish scheduling and resource management capabilities for GPU workloads, including quotas, fair-share policies, multi-tenant isolation, and priority handling
Drive integration between orchestration services and core infrastructure components, including networking, storage, identity, observability, and billing systems
Define service-level objectives for control plane reliability, job scheduling latency, cluster availability, and upgrade stability
Design APIs, CLI tooling, and UI workflows that enable self-service cluster management and workload operations
Partner with customer-facing teams to understand training, inference, and HPC use cases, translating real workload requirements into product capabilities
Monitor industry trends in container orchestration, HPC scheduling, distributed systems, and AI infrastructure to inform product direction

View Full Description & ApplyYou'll be redirected to the employer's site