
Search by job, company or skills
We're looking for an experienced HPC Systems Engineer (or Senior HPC Systems Engineer, depending on experience) to support and operate large-scale Linux-based high-performance computing (HPC), storage, and networking environments. This role supports research scientists, academic users, and enterprise workloads, ensuring reliable, secure, and high-performance HPC operations.
Key accountabilities:
. HPC Systems Operations: Administer, operate, and maintain Linux-based HPC clusters, including compute, storage, and high-speed networking
. Manage and support:
o HPC job schedulers (e.g. Slurm, PBS Pro, LSF)
o Parallel file systems (Lustre, GPFS/Spectrum Scale, BeeGFS)
o Cluster management and provisioning tools
. Perform system monitoring, patching, upgrades, and capacity planning.
. Troubleshooting and resolve hardware, software, OS, and network issues across HPC environments
. Participate in on-call or escalation support rotations as needed
. Work with our software engineer to support our AI/DL applications and our desktop engineer to help with user problems as required.
. Advice and guidance to researchers for HPC application development, debugging, optimization and parallelization
. Deliver HPC user training sessions and contribute to documentation and best-practice guides
Desired Experience, Knowledge & Attributes
. Bachelor's degree in computer science, Engineering, or a related field
. Preferably with at least 5 years experience with large-scale HPC systems
. Strong hands-on experience with:
o Linux operating systems (RHEL, Rocky, SUSE)
o HPC schedulers and resource managers
o Parallel file systems
. Understanding of HPC performance tuning and optimization techniques.
. Exposure to the following will be of added advantage:
o HPC code optimization and parallelization
o Language and Library: Fortran, Open MP, MPI, C, C++
o Linux Operating systems
o Knowledge of numerical simulation application such as climate research, weather forecasting and aeronautics simulation
Job ID: 137120349