Systems Engineer, HPC (APAC)
New
SingaporeFull-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Required Skills
- PythonBashLinuxTerraformNetworkingAnsible
Requirements
- Strong Linux systems administration experience.
- Experience working in large-scale environments such as HPC clusters or cloud infrastructure.
- Experience with Job schedulers (e.g., Slurm).
- Solid troubleshooting skills across systems, hardware, and networks.
- Proficiency with automation tooling (e.g., Ansible, Terraform).
- Proficiency with scripting languages (e.g., Python, Bash).
- Experience with containers and orchestration (e.g., Kubernetes) is a plus.
- Experience with storage systems (e.g., Ceph, Lustre, NFS) is a plus.
- Networking fundamentals (Ethernet and InfiniBand).
- GPU or AI/ML infrastructure experience is a plus.
Responsibilities
- Operate and maintain large-scale Linux environments across bare metal, clusters, and cloud.
- Monitor system health, troubleshoot incidents, and ensure high availability.
- Support production and research workloads across multiple environments.
- Scale clusters toward hundreds to thousands of nodes and manage petabyte-scale storage.
- Automate operational tasks using Python, Bash, Ansible, or Terraform.
- Contribute to system design and architecture decisions.
- Act as a bridge between users and infrastructure teams.
View Full Description & ApplyYou'll be redirected to the employer's site