Systems Engineer, HPC (APAC)

New
SingaporeFull-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Required Skills
PythonBashLinuxTerraformNetworkingAnsible

Requirements

  • Strong Linux systems administration experience.
  • Experience working in large-scale environments such as HPC clusters or cloud infrastructure.
  • Experience with Job schedulers (e.g., Slurm).
  • Solid troubleshooting skills across systems, hardware, and networks.
  • Proficiency with automation tooling (e.g., Ansible, Terraform).
  • Proficiency with scripting languages (e.g., Python, Bash).
  • Experience with containers and orchestration (e.g., Kubernetes) is a plus.
  • Experience with storage systems (e.g., Ceph, Lustre, NFS) is a plus.
  • Networking fundamentals (Ethernet and InfiniBand).
  • GPU or AI/ML infrastructure experience is a plus.

Responsibilities

  • Operate and maintain large-scale Linux environments across bare metal, clusters, and cloud.
  • Monitor system health, troubleshoot incidents, and ensure high availability.
  • Support production and research workloads across multiple environments.
  • Scale clusters toward hundreds to thousands of nodes and manage petabyte-scale storage.
  • Automate operational tasks using Python, Bash, Ansible, or Terraform.
  • Contribute to system design and architecture decisions.
  • Act as a bridge between users and infrastructure teams.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now