Fujitsu is seeking an experienced HPC Systems Engineer (or Senior HPC Systems Engineer, depending on experience) to support and operate large-scale Linux-based high-performance computing (HPC), storage, and networking environments. This role supports research scientists, academic users, and enterprise workloads, ensuring reliable, secure, and high-performance HPC operations.
Key Responsibilities: -
- HPC Systems Operations: Administer, operate, and maintain Linux-based HPC clusters, including compute, storage, and high-speed networking
- Manage and support:
- HPC job schedulers (e.g. Slurm, PBS Pro, LSF)
- Parallel file systems (Lustre, GPFS/Spectrum Scale, BeeGFS)
- Cluster management and provisioning tools
- Perform system monitoring, patching, upgrades, and capacity planning.
- Troubleshooting and resolve hardware, software, OS, and network issues across HPC environments
- Participate in on-call or escalation support rotations as needed
- Work with our software engineer to support our AI/DL applications and our desktop engineer to help with user problems as required.
- Advice and guidance to researchers for HPC application development, debugging, optimization and parallelization
- Deliver HPC user training sessions and contribute to documentation and best-practice guides
Job Requirements:
- Bachelor's degree in computer science, Engineering, or a related field
- Preferably with at least 5 years experience with large-scale HPC systems
- Strong hands-on experience with:
- Linux operating systems (RHEL, Rocky, SUSE)
- HPC schedulers and resource managers
- Parallel file systems
- Understanding of HPC performance tuning and optimization techniques.
- Exposure to the following will be of added advantage:
- HPC code optimization and parallelization
- Language and Library: Fortran, Open MP, MPI, C, C++
- Linux Operating systems
- Knowledge of numerical simulation application such as climate research, weather forecasting and aeronautics simulation
*Only shortlisted candidates will be notified.