Operation Manager - Data Center (Physical Infrastructure )

MICHAEL PAGE INTERNATIONAL PTE LTD

Singapore

8-11 Years

SGD 9,000 - 10,000 per month

Save

Posted 17 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Global exposure and opportunities to work on cross-border projects
High Leadership Visibility & Impact on Business Outcomes

About Our Client

A global leader renowned for innovative solutions, robust infrastructure, and driving digital transformation headquartered in Singapore.

Job Description

Serve as the overall lead and the point of accountability for end-to-end GPUaaS and data centre operations, including operational reporting.
Oversee day-to-day platform and facility operations across GPU hardware, networking, environmental systems, security controls, and supporting software.
Lead and coordinate internal operations teams, vendors, and consultants during routine activities as well as critical incidents.
Partner with engineering and external stakeholders to deliver platform upgrades and data centre improvement initiatives.
Develop, review, and refine operational processes to maintain platform stability across compute, power, cooling, and infrastructure components.
Take charge of major incidents, drive root cause analysis, and ensure clear, timely updates to customers and stakeholders.
Provide regular updates to the management on operational performance, risks, and improvement plans.
Ensure incidents are triaged and escalated appropriately based on severity, business impact, and SLA/SLO commitments.
Build, lead, and motivate a strong operations team with a focus on accountability and continuous improvement.
Set clear performance expectations, coach team members, and support ongoing professional development.
Oversee security incident management and uphold security and compliance standards within the GPUaaS environment.
Stay current with industry security developments and implement safeguards to protect customer workloads and platform integrity.
Support scheduled maintenance activities and participate in on-call duties when required.

The Successful Applicant

Bachelor's degree in Computer Science, Information Technology, or a related field.
At least 8 years of experience in data centre operations, with a minimum of 3 years in a leadership capacity.
Solid understanding of data centre infrastructure, including servers, networking, storage, and both physical and cybersecurity controls.
Practical experience with electrical and mechanical systems, facilities management, and preventive maintenance practices.
Demonstrated ability to lead teams and manage vendors effectively.
Strong organisational skills with the ability to adapt to evolving operational demands.
Hands-on experience with Linux and hypervisor administration in GPU or GPUaaS environments.
Strong analytical and troubleshooting skills, with a proactive approach to performance optimisation and system reliability.
Working knowledge of storage technologies, including capacity planning, troubleshooting, and data protection strategies.
Experience managing GPU infrastructure, including configuration, monitoring, and performance tuning.
Familiarity with liquid cooling technologies used in high-density GPU environments.
Understanding of GPU cluster architectures and AI/HPC environments, including collective communications (e.g. NCCL, RDMA), high-performance networking (e.g. InfiniBand), and containerised or orchestrated platforms supporting AI and HPC workloads.

What's on Offer

As a growing firm with a tightly-knit team, the successful candidate will get the chance to contribute to a highly performing team while having the autonomy to make certain decisions for the team.

Contact

Winson Low (Lic No: R22106039/ EA no: 18C9065)

Quote job ref

JN-032026-6959635

Phone number

+65 6416 9865