Search by job, company or skills

D

High Performance Compute HPC Lead / HPC Architect (Linux, GPU, Cloud HPC)

8-10 Years
SGD 6,000 - 10,000 per month

This job is no longer accepting applications

new job description bg glownew job description bg glow
  • Posted a month ago

Job Description

HPC Lead / HPC Architect (Linux, GPU, Cloud HPC) - 8+ Years Experience

Role Overview

We are looking for an experienced HPC Lead / HPC Architect to design, implement, and manage large-scale High Performance Computing (HPC) environments. This role will drive architecture, performance optimization, and transformation initiatives across compute (CPU/GPU), storage, and high-speed networking, supporting AI/ML, research, and enterprise workloads.

Key Responsibilities

  • Lead architecture, design, and deployment of HPC clusters (CPU & GPU computing environments)
  • Own the end-to-end HPC lifecycle: design, build, deployment, operations, and optimization
  • Define and manage job scheduling systems (Slurm, PBS, LSF) and workload orchestration
  • Drive performance tuning, benchmarking, and optimization for compute-intensive workloads (AI/ML, simulation, analytics)
  • Architect and manage high-performance storage systems (Lustre, GPFS/IBM Spectrum Scale, BeeGFS)
  • Design and implement low-latency, high-throughput networking (InfiniBand, RDMA, high-speed Ethernet)
  • Lead hybrid and cloud HPC integration (AWS, Azure, GCP HPC solutions)
  • Build and maintain automation frameworks (Ansible, Terraform, Infrastructure as Code, scripting)
  • Implement monitoring, observability, logging, and capacity planning (Prometheus, Grafana, ELK, AIOps tools)
  • Ensure security, compliance, identity and access management (IAM, LDAP/AD)
  • Collaborate with research teams, data scientists, application owners, and business stakeholders
  • Mentor and lead HPC engineers, system administrators, and infrastructure teams
  • Drive innovation in GPU computing, AI/ML infrastructure, and advanced automation

Required Skills & Experience

  • 8+ years of experience in HPC, Infrastructure Engineering, or Linux System Engineering
  • Proven expertise in:HPC cluster architecture, deployment, and operationsLinux/Unix systems administration at scale (RHEL, CentOS, Ubuntu)CPU and GPU computing environments (NVIDIA GPU, CUDA preferred)
  • Strong hands-on experience with:Job schedulers (Slurm, PBS, LSF)High-performance distributed storage (Lustre, GPFS, BeeGFS)Networking (TCP/IP, DNS, InfiniBand, RDMA, low-latency fabrics)
  • Experience in automation and scripting (Python, Bash/Shell, Ansible, Terraform)
  • Knowledge of cloud HPC architectures (AWS ParallelCluster, Azure CycleCloud, GCP HPC)

Preferred / Nice-to-Have Skills

  • Experience with AI/ML infrastructure, deep learning workloads, or research computing
  • Familiarity with containerization and orchestration (Docker, Kubernetes for HPC workloads)
  • Exposure to observability platforms, AIOps, and predictive monitoring
  • Experience in large-scale enterprise, research institutes, or university HPC environments
  • Knowledge of DevOps / Platform Engineering practices

Leadership & Profile

  • Strong stakeholder management and cross-functional collaboration skills
  • Ability to translate business or research requirements into scalable HPC architecture
  • Proven experience in team leadership, mentoring, and technical decision-making
  • Strategic mindset with hands-on technical depth in HPC systems and infrastructure

More Info

Job Type:
Industry:
Employment Type:

Job ID: 146053803

Similar Jobs

Singapore

Skills:

PaasIaasCloud ArchitectureHpcSalesforcehybrid sovereign modelsCRM ToolsMEDDPICCNVIDIA stackChallengerAI ML workloadssales methodologiesGPU technologies

Singapore

Skills:

StoragePaasObjectIaasLinuxHypervisorNfsKubernetesGPU system architectureMPIinfinibandSLURMDPUsRoCEinfrastructure as codeRDMAML frameworksNVIDIA GPUsNCCLcloud architectures