Description and Requirements
Key Responsibilities
Multi-Cloud Infrastructure Leadership & Architecture
- Lead the design, deployment, and management of cloud-native architectures across AWS, Microsoft Azure, and Google Cloud Platform in production environments
- Architect and implement scalable, highly available, and secure multi-cloud solutions aligned with business requirements and government compliance standards
- Provide technical leadership for cloud services including: EC2, S3, Lambda, ECS/EKS, RDS, CloudWatch, Systems Manager, Azure Virtual Machines, Azure Kubernetes Service (AKS), Azure Monitor, Compute Engine, Google Kubernetes Engine (GKE), Cloud Functions, Cloud Storage, and Cloud Monitoring
- Design and implement infrastructure architecture for new application deployments, ensuring best practices in scalability, performance, and cost optimization
- Evaluate and recommend cloud technologies, services, and architectural patterns to support business objectives and digital transformation initiatives
- Lead migration initiatives from on-premises to cloud and cloud-to-cloud migrations across AWS, Azure, and GCP
- Monitor and optimize cloud resource utilization, implementing cost management strategies and right-sizing recommendations
Technical Team Leadership & Mentorship
- Provide technical leadership, guidance, and mentorship to L2 Linux Engineers, L2 Wintel Engineers, and L3 Cloud Engineers
- Conduct technical design reviews, code reviews for Infrastructure as Code (IaC), and architectural assessments
- Act as the technical escalation point for complex infrastructure issues requiring advanced troubleshooting and resolution
- Drive knowledge transfer initiatives, facilitate technical training sessions, and develop engineering team capabilities
- Lead incident response for critical production issues, coordinating cross-functional teams and ensuring rapid resolution
- Foster a culture of operational excellence, automation, continuous improvement, and technical innovation
- Participate in 24/7 shift rotation and on-call escalation support to provide leadership during critical incidents
Operating System Lifecycle & Patch Management
- Oversee and coordinate enterprise-wide OS patching operations across RHEL (v7 to v10) and Windows Server (2016 to 2025) environments using native tools eg. AWS Systems Manager, Azure Update Management, WSUS, SCCM, and YUM/DNF
- Demonstrate advanced proficiency in both Linux and Windows system administration with the ability to troubleshoot complex issues across both platforms
- Develop and enforce patching strategies, policies, and schedules aligned with security compliance requirements and business continuity objectives
- Lead monthly and quarterly patch cycles, ensuring comprehensive testing, validation, and rollback procedures
- Coordinate patch approvals with Change Advisory Board (CAB) and manage stakeholder communications throughout patching activities
- Execute post-patch validation, remediation activities, and compliance reporting for audit requirements
- Identify and manage End-of-Life (EOL) operating systems and applications, planning upgrade and migration strategies
Security Hardening & Compliance Management
- Lead CIS (Center for Internet Security) security hardening initiatives and remediation activities across all cloud platforms and operating systems
- Implement and maintain security baselines based on CIS Benchmarks, government security standards (IM8 Policy), and industry best practices
- Oversee vulnerability management programs using tools such as Trend Micro, Qualys, Tenable, and AWS Config
- Prioritize, coordinate, and track security remediation efforts across infrastructure teams to ensure timely resolution of vulnerabilities
- Manage SSL/TLS certificate lifecycle, including renewals, implementation, and monitoring across multi-cloud environments
- Ensure compliance with government-level security, audit, and regulatory requirements including SOC 2, ISO 27001, and Singapore government frameworks
- Collaborate with InfoSec teams on security assessments, penetration testing, and audit preparations
- Implement and maintain security monitoring, logging, and alerting mechanisms using native cloud tools and third-party solutions
Infrastructure as Code (IaC) & Automation
- Lead Infrastructure as Code initiatives using Terraform, Ansible, AWS CloudFormation, and Azure Resource Manager (ARM) templates
- Design and implement automated infrastructure deployment pipelines with CI/CD integration
- Troubleshoot complex environment drift, pipeline failures, and infrastructure provisioning issues across multi-cloud environments
- Implement and maintain GitOps practices for infrastructure deployment and version control
- Drive automation initiatives to reduce manual operational overhead and improve infrastructure reliability
ITIL Process Management & Service Delivery
- Oversee ITIL processes including Incident Management, Problem Management, Change Management, and Request Management
- Manage and optimize ITSM workflows using ServiceNow, Jira, or similar enterprise ITSM platforms
- Lead Change Advisory Board (CAB) reviews for infrastructure changes, providing technical assessment and risk analysis
- Drive incident escalation processes, root cause analysis (RCA), and Post-Incident Review (PIR) activities
- Ensure compliance with Service Level Agreements (SLAs) and Operational Level Agreements (OLAs)
- Implement continuous service improvement initiatives based on operational metrics, KPIs, and stakeholder feedback
- Maintain comprehensive documentation including runbooks, standard operating procedures (SOPs), and architectural diagrams
Stakeholder Management & Communication
- Act as the primary technical liaison between infrastructure teams and business stakeholders, application owners, and senior management
- Manage expectations and communicate technical concepts effectively to both technical and non-technical audiences
- Coordinate with cross-functional teams including Development, Security, Networking, and Database teams on infrastructure initiatives
- Lead technical discussions, architecture reviews, and solution design sessions with stakeholders
- Provide regular status updates, operational reports, and capacity planning recommendations to management
- Manage vendor relationships for cloud services, security tools, and infrastructure platforms
- Facilitate communication during critical incidents, ensuring timely updates to all stakeholders and maintaining service transparency
Container Orchestration & DevSecOps
- Provide technical leadership for containerization initiatives using Docker, Kubernetes, Amazon ECS, Amazon EKS, Azure AKS, and Google GKE
- Implement and maintain DevSecOps practices with SHIP-HATS (Secure Hybrid Integration Pipeline - Hive Agile Testing Solutions) within Singapore Government technology stack
- Oversee CI/CD pipeline operations, integrating security scanning tools including SAST, DAST, and container vulnerability scanning
- Drive containerization strategy and microservices architecture adoption across application portfolios
Monitoring, Observability & Performance Optimization
- Design and implement comprehensive monitoring, logging, and alerting strategies using CloudWatch, Azure Monitor, GCP Cloud Monitoring, and third-party observability platforms
- Configure and maintain observability stacks for metrics, logs, traces, and alerts across multi-cloud environments
- Implement log aggregation and analysis using centralized logging solutions
- Lead performance optimization initiatives, conducting capacity planning and resource right-sizing activities
- Establish operational dashboards, reporting mechanisms, and proactive alerting for infrastructure health and performance
Documentation & Knowledge Management
- Create and maintain comprehensive infrastructure documentation, including system architecture diagrams, network topology, and data flow diagrams
- Develop and maintain technical runbooks, troubleshooting guides, and disaster recovery procedures
- Ensure audit-readiness through meticulous documentation discipline and change tracking
- Maintain Configuration Management Database (CMDB) accuracy and asset inventories
- Build and maintain knowledge base articles, FAQs, and best practice documentation for team reference
Required Qualifications
Education & Experience
- Bachelor's degree in Computer Science, Information Systems, Information Technology, or related technical field
- Minimum 5 years of experience in infrastructure and cloud engineering roles with progressive leadership responsibilities
- At least 5 years of hands-on experience managing multi-cloud environments across AWS, Microsoft Azure, and Google Cloud Platform
- Minimum 1 years of experience in regulated environments such as public sector, government, financial services, or healthcare
- Proven experience in 24/7 operational support environments with incident management and on-call responsibilities
- Demonstrated experience leading technical teams, mentoring engineers, and driving operational excellence initiatives
Technical Skills & Expertise
- Multi-Cloud Platforms: Expert-level proficiency in Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP) with hands-on experience across compute, storage, networking, security, and managed services
- Operating Systems: Advanced expertise in both Linux (RHEL, CentOS, Ubuntu, Amazon Linux) and Windows Server administration (2016, 2019, 2022, 2025) with deep troubleshooting capabilities
- Patch Management: Extensive experience with enterprise patch management using AWS Systems Manager, Azure Update Management, WSUS, SCCM, and YUM/DNF
- Security Hardening: Strong background in CIS Benchmark implementation, security remediation, and compliance frameworks (IM8, SOC 2, ISO 27001)
- Infrastructure as Code: Proficiency in Terraform, Ansible, AWS CloudFormation, ARM templates, and GitOps practices
- Container Technologies: Experience with Docker, Kubernetes, Amazon ECS/EKS, Azure AKS, Google GKE, and container orchestration
- ITIL & ITSM: Deep understanding of ITIL v3/v4 processes with hands-on experience using ServiceNow, Jira, or similar ITSM platforms
- DevSecOps: Experience with CI/CD pipelines, security scanning integration, and familiarity with SHIP-HATS platform
- Scripting & Automation: Proficiency in PowerShell, Bash/Shell scripting, Python for automation and infrastructure operations
- Monitoring & Observability: Experience with CloudWatch, Azure Monitor, GCP Cloud Monitoring, Prometheus, Grafana, ELK Stack, or similar platforms
Preferred Certifications
- AWS Certified Solutions Architect - Professional or AWS Certified DevOps Engineer - Professional
- Microsoft Certified: Azure Solutions Architect Expert
- Google Cloud Professional Cloud Architect
- Red Hat Certified Engineer (RHCE) or Red Hat Certified Architect (RHCA)
- Microsoft Certified: Windows Server Hybrid Administrator Associate
- ITIL v4 Foundation or ITIL Expert
- Certified Kubernetes Administrator (CKA) or Certified Kubernetes Security Specialist (CKS)
- HashiCorp Certified: Terraform Associate or Professional
Soft Skills & Competencies
- Technical Leadership: Demonstrated ability to lead technical initiatives, provide architectural guidance, and mentor engineering teams
- Stakeholder Management: Ability to manage relationships with diverse stakeholders, from technical teams to executive leadership
- Communication: Outstanding verbal and written communication skills with ability to articulate complex technical concepts to non-technical audiences
- Problem Solving: Advanced analytical and troubleshooting capabilities with systematic approach to complex multi-cloud infrastructure challenges
- Strategic Thinking: Ability to balance immediate operational needs with long-term infrastructure strategy and roadmap planning
- Collaboration: Strong teamwork and cross-functional collaboration skills with experience working across development, security, and operations teams
- Adaptability: Agile and responsive to rapidly changing technology landscapes, business requirements, and operational demands
- Accountability: Takes ownership of outcomes, demonstrates attention to detail, and ensures accurate and secure infrastructure implementations
- Customer Focus: Service-oriented mindset with commitment to delivering high-quality solutions that meet business and user needs
- Continuous Learning: Commitment to staying current with evolving cloud technologies, security practices, and industry best practices
- Mentorship: Proven ability to develop and support junior and mid-level engineers, fostering technical growth and career development
Technical Manager Role Expectations
The Technical Manager position requires:
- L3+ level technical proficiency with hands-on expertise across multi-cloud platforms, Linux, and Windows environments
- Proven experience architecting and deploying new infrastructure solutions in AWS, Azure, and GCP
- Strong technical leadership with the ability to lead L2 and L3 engineers through complex technical challenges
- Deep understanding of security hardening, CIS remediation, and compliance frameworks
- Exceptional stakeholder management capabilities with experience interfacing with senior leadership
- Proactive approach to incident prevention, operational excellence, and continuous improvement
- Calm, structured, and methodical incident handling with strict adherence to ITIL processes
- Audit-readiness mindset with comprehensive documentation practices
- Experience working within Singapore Government technology frameworks and compliance requirements
- Ability to drive escalations effectively and manage critical stakeholder communications during incidents
Work Arrangements
- This role requires participation in 24/7 shift rotation and on-call escalation support for critical infrastructure operations
- Extended work hours may be required during major incidents, maintenance windows, and change implementations
- On-call support responsibilities as part of senior leadership rotation schedule
- Flexibility to work outside normal office hours for patching activities, architecture deployments, and emergency response
- May require occasional travel for stakeholder meetings, vendor engagements, or cross-site coordination




