Staff Technical Program Manager (TPM) – Infrastructure and Node Delivery

Posted 2 months agoViewed
161000 - 237000 USD per year
United States, EuropeFull-TimeAI Hyperscaler
Company:CoreWeave
Location:United States, Europe
Languages:English
Seniority level:Staff, 10+ years
Experience:10+ years
Skills:
LeadershipProject ManagementPythonSoftware DevelopmentCloud ComputingKubernetesJiraCross-functional Team LeadershipGrafanaPrometheusRelease ManagementAgile methodologiesLinuxDevOpsProblem SolvingConfluenceCross-functional collaborationRisk Management
Requirements:
Bachelor's degree in Electrical Engineering, Computer Engineering, or a related technical field. 10+ years of experience in technical program management in GPU provisioning, fleet management, or large-scale compute infrastructure. Background in observability, monitoring, or telemetry systems (e.g., Prometheus, Grafana, OpenTelemetry). Hands-on experience coordinating NPI or GTM readiness for compute products. Technical understanding of system software orchestration and hardware/software integration. Solid understanding of hardware and fleet development lifecycles. Proven ability to lead cross-functional teams, influence without direct authority, and drive consensus in a fast-paced environment. Exceptional communication, interpersonal, and presentation skills. Proficiency in program management tools (e.g., Jira, Confluence, Sheet).
Responsibilities:
Ensure infrastructure and system software are production-ready for new hardware and compute platforms. Drive end-to-end programs spanning GPU provisioning, at-scale deployments, Fleet NPI readiness, and vendor management. Coordinate with hardware compute engineering, Fleet teams, and external vendors to maintain service reliability, enforce SLAs, and lead incident response efforts. Partner with engineering teams to improve monitoring, telemetry, and fleet observability for proactive performance management. Define and track metrics around GPU fleet health, performance, and reliability. Run post-incident reviews and drive action items that enhance system reliability and prevent regressions. Collaborate with internal customers to collect feedback, enable adoption of core infrastructure platforms, and refine onboarding experiences. Work closely with Product, Infrastructure, Platform Engineering, Vendor, and Customer Experiences to align on roadmap priorities and customer delivery timelines. Communicate program status, risks, and critical decisions to senior leadership and executive stakeholders.
About the Company
CoreWeave
Cloud Computing
View Company Profile
Similar Jobs:
Posted 10 months ago
United StatesFull-TimeData as a Service
(Associate) Technical Program Manager (ATPM / TPM) - Remote
Company:
Posted 2 months ago
Denver, Colorado, New York, New York, Dallas, Texas, Lisbon, PortugalFull-TimeHealthcare Technology
Staff Technical Program Manager
Company:Cleerly
Posted 3 months ago
United StatesFull-TimeMental Healthcare
Staff Technical Program Manager
Company:Headway