Apply

Senior AI/HPC Storage Engineer

Posted 2024-10-12

View full description

💎 Seniority level: Senior, A minimum of 7 years

📍 Location: United States, Canada, United Kingdom

💸 Salary: $160,000 - $182,000 per year

🔍 Industry: TechBio

🏢 Company: Recursion

🗣️ Languages: English

⏳ Experience: A minimum of 7 years

🪄 Skills: AWSDockerLeadershipPythonBashCloud ComputingGCPKubernetesCommunication SkillsCollaborationCI/CD

Requirements:
  • A minimum of 7 years of experience in managing data storage infrastructure, preferably within global BioPharma organizations.
  • In-depth knowledge of distributed/parallel file systems (IBM Storage Scale GPFS), multi-tier file (NAS), hybrid object storage (MinIO), and storage access and data transfer networking protocols.
  • Extensive experience designing, deploying, testing, supporting, and troubleshooting complex Linux-based computing and data storage environments.
  • Python programming and Bash scripting experience.
  • In-depth hands-on experience in provisioning, configuring, and managing infrastructure through modern CI/CD techniques, GitOps, Infrastructure as Code (IaC) and cloud automation principles.
  • Solid experience with software-defined infrastructure and cloud computing platforms, including Kubernetes, GCP, AWS, and others.
  • Practical knowledge of resource management and job scheduling using Slurm and Kubernetes. Knowledge of container technologies like Apptainer and Docker.
  • Strong verbal and written communication skills for effective documentation and collaboration.
  • Prior experience mentoring, guiding, and cross-training team members.
Responsibilities:
  • You will be responsible for designing, implementing, testing, maintaining, and optimizing our data storage infrastructure and services, utilizing an Infrastructure as Code approach across both on-premises and public cloud environments.
  • You will drive innovation across all storage tiers within our AI/HPC infrastructure, ensuring we deliver a scalable and effective data platform to support our mission.
  • By developing scripts and workflows, you will automate and verify storage infrastructure provisioning and dynamic reconfiguration, enhancing support for our AI/HPC storage environments.
  • Your role also includes researching, deploying, and optimizing accessibility, performance, security, and data lifecycle management policies.
  • Regular assessments of our storage platforms' health and operational performance against established metrics will be part of your responsibilities, with a focus on meeting and exceeding operational service level objectives.
Apply