Search by job, company or skills

MICHAEL PAGE INTERNATIONAL PTE LTD

Senior DevOps Engineer - AI/ML

3-6 Years
SGD 8,000 - 12,000 per month
new job description bg glownew job description bg glownew job description bg svg
  • Posted a day ago
  • Be among the first 10 applicants
Early Applicant

Job Description

  • Promotes internal mobility and encourages continuous development
  • Offers exposure to diverse projects and technologies

About Our Client

A global leader renowned for innovative solutions, robust infrastructure, and driving digital transformation headquarterd in Singapore.

Job Description

  • Design, deploy, and operate scalable GPU clusters supporting AI, ML, and HPC workloads across on-prem and cloud environments
  • Automate GPU resource provisioning, scheduling, and lifecycle management using Kubernetes, IaC, and scripting
  • Build, manage, and optimize CI/CD pipelines for GPU-accelerated applications and AI models
  • Monitor and ensure GPU cluster health, performance, capacity, and availability using modern observability tools
  • Troubleshoot and optimize system-level components including Linux, Kubernetes, Slurm, GPU drivers, CUDA, and high-speed networking
  • Implement performance tuning, benchmarking, and security best practices for multi-tenant GPUaaS platforms
  • Collaborate with cross-functional teams to support users, resolve issues, and continuously improve AI and HPC infrastructure

The Successful Applicant

  • Bachelor's degree in Computer Science, Engineering, Information Technology, or a related technical discipline
  • Strong Linux system administration experience across Ubuntu, CentOS, Rocky Linux, or similar distributions
  • Hands-on experience with DevOps and infrastructure tools including Kubernetes, Terraform, Ansible, and CI/CD platforms
  • Solid understanding of automation, CI/CD, monitoring, and operational best practices in production environments
  • Proficiency in scripting and automation using Python, Bash, or similar languages
  • Experience or working knowledge of cloud platforms (IaaS/PaaS), GPU architecture, and AI frameworks such as TensorFlow or PyTorch
  • Strong problem-solving, communication, and collaboration skills, with the ability to work effectively across engineering and operations teams

What's on Offer

As a growing firm with a tightly-knit team, the successful candidate will get the chance to contribute to a highly performing team while having the autonomy to make certain decisions for the team.

Contact

Winson Low (Lic No: R22106039/ EA no: 18C9065)

Quote job ref

JN-012026-6914907

Phone number

+65 6416 9865

More Info

Job Type:
Industry:
Employment Type:

Job ID: 138500933