Oversee daily IT infrastructure and systems operations across the local office and Singapore Data Centre
Lead operations in mission-critical production environments with strict uptime, performance, and compliance requirements
Manage enterprise-level data centre infrastructure, including servers, storage, monitoring systems, and network environments
Ensure high availability, reliability, and performance optimisation of critical systems
Lead incident response and conduct root cause analysis (RCA) for high-severity incidents, implementing preventive and corrective measures
Support deployment and ongoing operations of AI training and inference platforms, including GPU-based infrastructure
Ensure adherence to financial-grade governance standards, including audit traceability, access control, change management, and data security compliance
Maintain comprehensive system documentation, audit trails, and operational procedures
Requirements
Minimum 8 years of hands-on experience in enterprise data centre environments or regulated industries (e.g. banking, financial services, aviation or other compliance-driven sectors)
Proven experience supporting 24/7 mission-critical production environments with high availability requirements
Demonstrated experience in incident management and conducting root cause analysis (RCA) in high-availability systems
Experience in AI platform operations, including GPU-based infrastructure and model deployment in live production environments
Experience implementing AIOps or intelligent monitoring frameworks within enterprise production systems
Strong understanding of change management processes, audit compliance, access control, and information security governance
Ability to work independently in a growing organisation and take ownership of technical operations
Strong analytical, troubleshooting, and problem-solving capabilities