Search by job, company or skills

kuberox technologies

High-Performance Computing (HPC) Engineer - Consultant

5-10 Years
Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 2 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Job Summary

We are seeking a skilled High-Performance Computing (HPC) Engineer with 5–10 years of experience to design, deploy, manage, and optimize HPC cluster environments. The ideal candidate will have hands-on experience with cluster scheduling, monitoring, performance tuning, and supporting scientific or engineering workloads in Linux-based environments.

Key Responsibilities

  • Design, deploy, and maintain HPC cluster infrastructure to ensure high availability and performance.
  • Manage and configure job scheduling systems such as PBS and SLURM.
  • Implement and maintain monitoring solutions using Grafana, Nagios, Prometheus, and Ganglia.
  • Administer cluster management tools including Bright Cluster Manager, xCAT, and Puppet for infrastructure automation.
  • Configure and troubleshoot high-speed networking technologies including InfiniBand and Gigabit Ethernet.
  • Perform system performance analysis, profiling, and debugging using tools like Intel VTune, Valgrind, and gprof.
  • Provide application support for scientific and engineering workloads using GNU and Intel CUDA compilers, as well as MKL libraries.
  • Manage virtualization environments using Proxmox and handle license management tools like FlexLM.
  • Configure and maintain storage solutions including parallel file systems and enterprise object storage platforms.
  • Ensure system security, patching, and compliance in Red Hat Linux environments.
  • Collaborate with research, engineering, and IT teams to optimize workloads and resource utilization.
  • Document system architecture, processes, and troubleshooting guides.

Required Skills & Qualifications

  • 5–10 years of experience in HPC systems administration or engineering.
  • Strong experience with job schedulers such as PBS and SLURM.
  • Hands-on experience with monitoring tools: Grafana, Nagios, Prometheus, Ganglia.
  • Expertise in cluster management tools like Bright Cluster Manager, xCAT, and Puppet.
  • Solid understanding of HPC networking, including InfiniBand and Ethernet.
  • Experience with performance profiling and debugging tools (Intel VTune, Valgrind, gprof).
  • Familiarity with compilers and libraries: GNU, Intel CUDA, MKL.
  • Experience with virtualization platforms like Proxmox and license management (FlexLM).
  • Knowledge of storage technologies: parallel file systems (e.g., Lustre, GPFS) and object storage.
  • Strong Linux administration skills, specifically Red Hat Enterprise Linux.
  • Scripting skills (Bash, Python, or similar) for automation and troubleshooting.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 147265851