We are seeking an experienced HPC Systems Engineer to support and operate large-scale Linux-based high-performance computing (HPC), storage, and networking environments. The role supports research, academic, and enterprise workloads, ensuring stable, secure, and high-performance HPC operations.
Responsibilities
- Administer, operate, and maintain Linux-based HPC clusters, including compute, storage, and high-speed networking infrastructure
- Manage and support HPC schedulers and resource managers (e.g. Slurm, PBS Pro, LSF)
- Support and maintain parallel file systems such as Lustre, GPFS/Spectrum Scale, and BeeGFS
- Perform system monitoring, patching, upgrades, and capacity planning
- Troubleshoot and resolve hardware, operating system, software, and network issues across HPC environments
- Participate in on-call or escalation support rotations as required
- Work closely with software engineers, AI/DL teams, and desktop support teams to support applications and users
- Provide advice and guidance to researchers on HPC application development, debugging, optimization, and parallelization
- Deliver HPC user training sessions and contribute to system documentation and best-practice guides
Requirements
- Preferably minimum 5 years of experience supporting large-scale HPC systems
- Strong hands-on experience with:
- Linux operating systems (RHEL, Rocky Linux, SUSE)
- HPC schedulers and resource management tools
- Parallel file systems
- Good understanding of HPC performance tuning and optimization techniques
- Experience with HPC code optimization and parallel programming
- Exposure to AI / deep learning workloads in HPC environments
Please send your detailed resume in MS Word format to [Confidential Information] with
- Education Level
- Working experiences
- Each employment background
- Reason for leaving each employment
- Last drawn salary
- Expected salary
- Date of availability