
Search by job, company or skills
Key Responsibilities
System Monitoring and Reliability Management
Proactively monitor and refine system alerts and thresholds to reduce false
positives and enhance detection accuracy
Implement and maintain comprehensive monitoring solutions to ensure early
identification of potential issues
Optimise system performance and reliability through continuous monitoring and
preventive maintenance
Develop and maintain monitoring dashboards to provide real-time visibility into
system health
Incident Response and Resolution
Lead rapid response efforts for security incidents, system outages, and
performance issues
Collaborate effectively with cross-functional teams, vendors, and external partners
to resolve complex technical problems
Coordinate incident response activities to minimise downtime and service
disruption
Escalate critical issues appropriately whilst maintaining clear communication with
stakeholders
Security Operations and Compliance
Partner closely with Security Operations Centre (SOC) teams during threat
investigations and security incidents
Support compliance audits and ensure adherence to regulatory requirements and
security standards Implement and maintain security controls across infrastructure and applications
Monitor security events and assist in threat detection and analysis activities
Root Cause Analysis and Continuous Improvement
Conduct thorough root cause analysis of critical incidents using performance
analytics and forensic techniques
Identify systemic issues and implement preventive measures to enhance system
resilience
Analyse historical data and trends to predict potential issues and recommend
proactive solutions
Drive continuous improvement initiatives based on incident patterns and system
performance metrics
Configuration and Data Management
Support implementation and maintenance of Configuration Management Database
(CMDB) systems
Ensure data accuracy and integrity across all configuration items and system
documentation
Support IT service management platforms to improve asset visibility, automate
workflows, and strengthen governance and compliance
Maintain up-to-date asset inventories and system configurations
Streamline operational processes through effective configuration management
practices
Knowledge Management and Documentation
Develop and maintain comprehensive incident response procedures and technical
documentation
Create and update knowledge base articles for common issues and resolution
procedures
Facilitate knowledge transfer sessions and training for team members
Establish standardised processes and best practices for incident handling
Analytics and Reporting
Generate regular reports on system performance, security metrics, and operational
efficiency
Identify trends and patterns in system behaviour to support proactive management
Present findings and recommendations to management and technical teams
Required Qualifications
Tertiary qualification in Computer Science, Information Technology, Cybersecurity,
or related field with additional professional certifications
Minimum 5+ years of experience in system administration, cybersecurity, or IT
operations
Experience with incident response and operations centre environments
Preferred Qualifications
Professional certifications such as ITIL, ServiceNow, CompTIA, Cisco, Microsoft, or
equivalent
Strong knowledge of ServiceNow platform, its various modules (such as CMDB,
ITOM, ITSM, GRC) or similar tools
Familiarity with AI/ML applications in IT operations (e.g. anomaly detection,
predictive maintenance, intelligent automation) will be advantageous
Experience with automation and orchestration tools
Job ID: 134944279