Lightning AI

Private Company
ShareTweet

Open Positions1

Remote within the U.S.Full-TimeAI/ML, HPCPosted
  • Operate and scale distributed storage systems, including VAST and S3-compatible object storage (e.g., Ceph)
  • Improve performance, reliability, and efficiency of storage systems supporting large-scale AI/ML workloads
  • Troubleshoot complex storage and data path issues across hardware and software layers
  • Optimize storage performance to support high-throughput, low-latency AI training and inference workloads
  • Build and maintain automation for provisioning, managing, and monitoring storage infrastructure
  • Develop Python-based tools and workflows to reduce manual operational overhead
  • Manage and operate Linux-based systems in production, including bare-metal environments
  • Support capacity planning, utilization tracking, and forecasting for storage systems
  • Leverage monitoring and telemetry to diagnose issues and improve system performance and reliability
  • Work closely with Infrastructure Engineering, Network Engineering, and Platform teams to integrate storage into the broader platform
PythonLinux

Similar Companies