
Search by job, company or skills
Role Purpose
To manage, operate, and maintain the Client's HPC compute, login, HCI, and virtualisation platforms in alignmentwith Authority-approved baselines, ITSM processes, cybersecurity requirements, and operational governance policies.
Key Responsibilities
1. OS & Core Platform Management
Commission and decommission compute and login nodes.
Maintain golden images and ensure kernel version alignment and consistency.
Apply OS hardening in accordance with authority-approved baselines.
Execute staging-to-production patch cycles with proper validation and rollback plans.
Detect and remediate configuration drift.
Perform daily platform health checks and provide validation reports.
2. Virtualization & HCI Operations
Manage VM lifecycle and maintain template hygiene.
Execute scheduled monthly platform patching activities.
Perform backup verification and representative restore testing.
Participate in quarterly Disaster Recovery (DR) drills.
Maintain inventory records, tagging standards, and configuration documentation.
3. Incident & Change Management
Respond to incidents in accordance with SLA targets:
P1: ≤ 30 minutes response time
P2: ≤ 1 hour response time
Implement approved changes with documented impact assessments and rollback plans.
Perform post-incident Root Cause Analysis (RCA) and track corrective actions.
Maintain ITSM evidence for all operational activities.
Perform vulnerability remediation within defined timelines.
Conduct key and certificate rotation activities.
Execute account hygiene checks and periodic access reviews.
Maintain audit-ready evidence packs and compliance documentation.
Environment / Platforms
Required Experience
Hands-on experience with:
Required Certifications
Job ID: 145585045