Senior Software Engineer - Fleet Management

New
N
NscaleAI Infrastructure
Join our thriving remote-first team. Geography is no barrier to impact or connection.Full-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
5+ years
Required Skills
PythonDistributed Systems

Requirements

  • 5+ years software engineering experience building and operating production systems
  • Focus on infrastructure automation or workflow tooling
  • Strong proficiency in Python
  • Experience building distributed systems at scale
  • Expertise in systems design tradeoffs
  • Proficiency in using AI tools like Claude or Cursor
  • Delivered automation systems from ambiguous requirements to operational production systems
  • Hands-on day 2 operations experience (monitoring, incident response, performance optimisation)
  • Ability to work independently in a fast-paced environment

Responsibilities

  • Build workflow automation systems for GPU node and network switch lifecycle management at scale
  • Design foundational platform components with established software patterns that others build on
  • Implement device provisioning, burn-in testing, network configuration, and hardware health validation workflows
  • Integrate with datacenter infrastructure management systems, cloud orchestration platforms, and bare metal provisioning tools
  • Build distributed workflow orchestration systems to coordinate complex automation tasks across the fleet
  • Drive technical strategy for reliability, observability, incident response, and operational excellence
  • Partner with Infrastructure, Platform, and SRE teams to automate hardware lifecycle operations
  • Use AI tools to accelerate delivery while maintaining architectural coherence
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now