
Search by job, company or skills
Key Responsibilities
Multi-Cloud Infrastructure Leadership & Architecture
. Lead the design, deployment, and management of cloud-native architectures across AWS, Microsoft Azure, and Google Cloud Platform in production environments
. Architect and implement scalable, highly available, and secure multi-cloud solutions aligned with business requirements and government compliance standards
. Provide technical leadership for cloud services including: EC2, S3, Lambda, ECS/EKS, RDS, CloudWatch, Systems Manager, Azure Virtual Machines, Azure Kubernetes Service (AKS), Azure Monitor, Compute Engine, Google Kubernetes Engine (GKE), Cloud Functions, Cloud Storage, and Cloud Monitoring
. Design and implement infrastructure architecture for new application deployments, ensuring best practices in scalability, performance, and cost optimization
. Evaluate and recommend cloud technologies, services, and architectural patterns to support business objectives and digital transformation initiatives
. Lead migration initiatives from on-premises to cloud and cloud-to-cloud migrations across AWS, Azure, and GCP
. Monitor and optimize cloud resource utilization, implementing cost management strategies and right-sizing recommendations
Technical Team Leadership & Mentorship
. Provide technical leadership, guidance, and mentorship to L2 Linux Engineers, L2 Wintel Engineers, and L3 Cloud Engineers
. Conduct technical design reviews, code reviews for Infrastructure as Code (IaC), and architectural assessments
. Act as the technical escalation point for complex infrastructure issues requiring advanced troubleshooting and resolution
. Drive knowledge transfer initiatives, facilitate technical training sessions, and develop engineering team capabilities
. Lead incident response for critical production issues, coordinating cross-functional teams and ensuring rapid resolution
. Foster a culture of operational excellence, automation, continuous improvement, and technical innovation
. Participate in 24/7 shift rotation and on-call escalation support to provide leadership during critical incidents
Operating System Lifecycle & Patch Management
. Oversee and coordinate enterprise-wide OS patching operations across RHEL (v7 to v10) and Windows Server (2016 to 2025) environments using native tools eg. AWS Systems Manager, Azure Update Management, WSUS, SCCM, and YUM/DNF
. Demonstrate advanced proficiency in both Linux and Windows system administration with the ability to troubleshoot complex issues across both platforms
. Develop and enforce patching strategies, policies, and schedules aligned with security compliance requirements and business continuity objectives
. Lead monthly and quarterly patch cycles, ensuring comprehensive testing, validation, and rollback procedures
. Coordinate patch approvals with Change Advisory Board (CAB) and manage stakeholder communications throughout patching activities
. Execute post-patch validation, remediation activities, and compliance reporting for audit requirements
. Identify and manage End-of-Life (EOL) operating systems and applications, planning upgrade and migration strategies
Security Hardening & Compliance Management
. Lead CIS (Center for Internet Security) security hardening initiatives and remediation activities across all cloud platforms and operating systems
. Implement and maintain security baselines based on CIS Benchmarks, government security standards (IM8 Policy), and industry best practices
. Oversee vulnerability management programs using tools such as Trend Micro, Qualys, Tenable, and AWS Config
. Prioritize, coordinate, and track security remediation efforts across infrastructure teams to ensure timely resolution of vulnerabilities
. Manage SSL/TLS certificate lifecycle, including renewals, implementation, and monitoring across multi-cloud environments
. Ensure compliance with government-level security, audit, and regulatory requirements including SOC 2, ISO 27001, and Singapore government frameworks
. Collaborate with InfoSec teams on security assessments, penetration testing, and audit preparations
. Implement and maintain security monitoring, logging, and alerting mechanisms using native cloud tools and third-party solutions
Infrastructure as Code (IaC) & Automation
. Lead Infrastructure as Code initiatives using Terraform, Ansible, AWS CloudFormation, and Azure Resource Manager (ARM) templates
. Design and implement automated infrastructure deployment pipelines with CI/CD integration
. Troubleshoot complex environment drift, pipeline failures, and infrastructure provisioning issues across multi-cloud environments
. Implement and maintain GitOps practices for infrastructure deployment and version control
. Drive automation initiatives to reduce manual operational overhead and improve infrastructure reliability
ITIL Process Management & Service Delivery
. Oversee ITIL processes including Incident Management, Problem Management, Change Management, and Request Management
. Manage and optimize ITSM workflows using ServiceNow, Jira, or similar enterprise ITSM platforms
. Lead Change Advisory Board (CAB) reviews for infrastructure changes, providing technical assessment and risk analysis
. Drive incident escalation processes, root cause analysis (RCA), and Post-Incident Review (PIR) activities
. Ensure compliance with Service Level Agreements (SLAs) and Operational Level Agreements (OLAs)
. Implement continuous service improvement initiatives based on operational metrics, KPIs, and stakeholder feedback
. Maintain comprehensive documentation including runbooks, standard operating procedures (SOPs), and architectural diagrams
Stakeholder Management & Communication
. Act as the primary technical liaison between infrastructure teams and business stakeholders, application owners, and senior management
. Manage expectations and communicate technical concepts effectively to both technical and non-technical audiences
. Coordinate with cross-functional teams including Development, Security, Networking, and Database teams on infrastructure initiatives
. Lead technical discussions, architecture reviews, and solution design sessions with stakeholders
. Provide regular status updates, operational reports, and capacity planning recommendations to management
. Manage vendor relationships for cloud services, security tools, and infrastructure platforms
. Facilitate communication during critical incidents, ensuring timely updates to all stakeholders and maintaining service transparency
Container Orchestration & DevSecOps
. Provide technical leadership for containerization initiatives using Docker, Kubernetes, Amazon ECS, Amazon EKS, Azure AKS, and Google GKE
. Implement and maintain DevSecOps practices with SHIP-HATS (Secure Hybrid Integration Pipeline - Hive Agile Testing Solutions) within Singapore Government technology stack
. Oversee CI/CD pipeline operations, integrating security scanning tools including SAST, DAST, and container vulnerability scanning
. Drive containerization strategy and microservices architecture adoption across application portfolios
Monitoring, Observability & Performance Optimization
. Design and implement comprehensive monitoring, logging, and alerting strategies using CloudWatch, Azure Monitor, GCP Cloud Monitoring, and third-party observability platforms
. Configure and maintain observability stacks for metrics, logs, traces, and alerts across multi-cloud environments
. Implement log aggregation and analysis using centralized logging solutions
. Lead performance optimization initiatives, conducting capacity planning and resource right-sizing activities
. Establish operational dashboards, reporting mechanisms, and proactive alerting for infrastructure health and performance
Documentation & Knowledge Management
. Create and maintain comprehensive infrastructure documentation, including system architecture diagrams, network topology, and data flow diagrams
. Develop and maintain technical runbooks, troubleshooting guides, and disaster recovery procedures
. Ensure audit-readiness through meticulous documentation discipline and change tracking
. Maintain Configuration Management Database (CMDB) accuracy and asset inventories
. Build and maintain knowledge base articles, FAQs, and best practice documentation for team reference
Required Qualifications
Education & Experience
. Bachelor's degree in Computer Science, Information Systems, Information Technology, or related technical field
. Minimum 5 years of experience in infrastructure and cloud engineering roles with progressive leadership responsibilities
. At least 5 years of hands-on experience managing multi-cloud environments across AWS, Microsoft Azure, and Google Cloud Platform
. Minimum 1 years of experience in regulated environments such as public sector, government, financial services, or healthcare
. Proven experience in 24/7 operational support environments with incident management and on-call responsibilities
. Demonstrated experience leading technical teams, mentoring engineers, and driving operational excellence initiatives
Technical Skills & Expertise
. Multi-Cloud Platforms: Expert-level proficiency in Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP) with hands-on experience across compute, storage, networking, security, and managed services
. Operating Systems: Advanced expertise in both Linux (RHEL, CentOS, Ubuntu, Amazon Linux) and Windows Server administration (2016, 2019, 2022, 2025) with deep troubleshooting capabilities
. Patch Management: Extensive experience with enterprise patch management using AWS Systems Manager, Azure Update Management, WSUS, SCCM, and YUM/DNF
. Security Hardening: Strong background in CIS Benchmark implementation, security remediation, and compliance frameworks (IM8, SOC 2, ISO 27001)
. Infrastructure as Code: Proficiency in Terraform, Ansible, AWS CloudFormation, ARM templates, and GitOps practices
. Container Technologies: Experience with Docker, Kubernetes, Amazon ECS/EKS, Azure AKS, Google GKE, and container orchestration
. ITIL & ITSM: Deep understanding of ITIL v3/v4 processes with hands-on experience using ServiceNow, Jira, or similar ITSM platforms
. DevSecOps: Experience with CI/CD pipelines, security scanning integration, and familiarity with SHIP-HATS platform
. Scripting & Automation: Proficiency in PowerShell, Bash/Shell scripting, Python for automation and infrastructure operations
. Monitoring & Observability: Experience with CloudWatch, Azure Monitor, GCP Cloud Monitoring, Prometheus, Grafana, ELK Stack, or similar platforms
Preferred Certifications
. AWS Certified Solutions Architect - Professional or AWS Certified DevOps Engineer - Professional
. Microsoft Certified: Azure Solutions Architect Expert
. Google Cloud Professional Cloud Architect
. Red Hat Certified Engineer (RHCE) or Red Hat Certified Architect (RHCA)
. Microsoft Certified: Windows Server Hybrid Administrator Associate
. ITIL v4 Foundation or ITIL Expert
. Certified Kubernetes Administrator (CKA) or Certified Kubernetes Security Specialist (CKS)
. HashiCorp Certified: Terraform Associate or Professional
Soft Skills & Competencies
. Technical Leadership: Demonstrated ability to lead technical initiatives, provide architectural guidance, and mentor engineering teams
. Stakeholder Management: Ability to manage relationships with diverse stakeholders, from technical teams to executive leadership
. Communication: Outstanding verbal and written communication skills with ability to articulate complex technical concepts to non-technical audiences
. Problem Solving: Advanced analytical and troubleshooting capabilities with systematic approach to complex multi-cloud infrastructure challenges
. Strategic Thinking: Ability to balance immediate operational needs with long-term infrastructure strategy and roadmap planning
. Collaboration: Strong teamwork and cross-functional collaboration skills with experience working across development, security, and operations teams
. Adaptability: Agile and responsive to rapidly changing technology landscapes, business requirements, and operational demands
. Accountability: Takes ownership of outcomes, demonstrates attention to detail, and ensures accurate and secure infrastructure implementations
. Customer Focus: Service-oriented mindset with commitment to delivering high-quality solutions that meet business and user needs
. Continuous Learning: Commitment to staying current with evolving cloud technologies, security practices, and industry best practices
. Mentorship: Proven ability to develop and support junior and mid-level engineers, fostering technical growth and career development
Technical Manager Role Expectations
The Technical Manager position requires:
. L3+ level technical proficiency with hands-on expertise across multi-cloud platforms, Linux, and Windows environments
. Proven experience architecting and deploying new infrastructure solutions in AWS, Azure, and GCP
. Strong technical leadership with the ability to lead L2 and L3 engineers through complex technical challenges
. Deep understanding of security hardening, CIS remediation, and compliance frameworks
. Exceptional stakeholder management capabilities with experience interfacing with senior leadership
. Proactive approach to incident prevention, operational excellence, and continuous improvement
. Calm, structured, and methodical incident handling with strict adherence to ITIL processes
. Audit-readiness mindset with comprehensive documentation practices
. Experience working within Singapore Government technology frameworks and compliance requirements
. Ability to drive escalations effectively and manage critical stakeholder communications during incidents
Work Arrangements
. This role requires participation in 24/7 shift rotation and on-call escalation support for critical infrastructure operations
. Extended work hours may be required during major incidents, maintenance windows, and change implementations
. On-call support responsibilities as part of senior leadership rotation schedule
. Flexibility to work outside normal office hours for patching activities, architecture deployments, and emergency response
. May require occasional travel for stakeholder meetings, vendor engagements, or cross-site coordination
Job ID: 132820791