Search by job, company or skills

DSO National Laboratories

AI / High-Performance Computing Engineer

2-4 Years
Save
  • Posted a day ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Job Description

In this role, you will:

  • Participate in the full lifecycle of HPC cluster ops from system bring-up-and-down, workload characterisation and optimisation, and rollout of new AI and Software Services.
  • Design & operate a GPU orchestration layer with high availability and utilisation for AI training, inference and other scientific workloads.
  • Partner with other DSO engineers to design standards, automate operations, and translate research code into performant workloads on distributed systems.
  • Maintain hardware infrastructure, distributed storage, high speed networking and supporting IT infrastructure and support maintenance and upgrades.

Job Requirements

  • Degree in Computer Science & Engineering / Software Engineering / Artificial Intelligence or any other related field
  • Minimum 2-year experience in IT Infrastructure or related field. More experience candidates may be considered for senior role.
  • Strong proficiency in Linux environments, computer architecture, and Python / Bash scripting for tooling and automation.
  • Working proficiency of Kubernetes container orchestration and infrastructure provisioning/management software (e.g. Ansible, Terraform) for fleet automation.
  • Experience with NVIDIA GPUs, GitOps, Infra CI/CD, networking protocols, and other AI infrastructure technologies will be advantageous.
  • Strong written and verbal communication to lead vendor and cross-functional engagements and/or performance analysis and troubleshooting initiatives.

More Info

Job Type:
Industry:
Employment Type:

Job ID: 150612583