Former or current Linux / systems / network administrator comfortable living in the shell 5+ years of experience in DevOps/SRE/Platform/Infrastructure roles running production systems Deep familiarity with Linux as a daily driver, including shell scripting Strong experience with workload management, containerization, and orchestration (Slurm, Docker, Kubernetes) in production environments Solid understanding of CI/CD tools and workflows Hands-on cloud infrastructure experience (AWS, GCP, Azure) Proficiency with infrastructure as code (Terraform, CloudFormation, or similar) Experience with monitoring and logging stacks (Grafana, Prometheus, Loki, CloudWatch, or equivalents) Familiarity with ML pipeline and experiment orchestration tools Solid programming skills in Python Ability to read and debug code that uses common ML libraries (PyTorch, TensorFlow)