Manage application and security incidents, conduct problem determination, work with various internal teams and vendors to resolve issues on a timely basis to meet SLA, provides reporting and escalation to higher management or incident committee if necessary
Develop operations and processes guide to ensure every aspect of operations is documented and complies with audit requirements
Manage day-to-day operation activities, analyse statistics and write status and progress reports, and present findings to stakeholders and higher management
Manage operations team consisting of staff and vendors, ensuring support is available on a 24/7 basis
Proven experience as an Operations Engineer or similar role in an IT setting
Implement change management and incident management workflows, using ITSM tools e.g. Remedy, Zendesk, ServiceDesk to automate workflows is advantageous
Implement security and access control measures to control privileged access to test and production environment
Implement full stack monitoring (i.e. application and infrastructure) using Application Performance Management (APM) tools. Familiarity with cloud native monitoring options (e.g. Cloudwatch, Stackdriver) and the OpenAPM stack is preferred
Identify and implement process automation to minimum downtime and human errors. Familiarity with scripting tools e.g. Terraform, Ansible is preferred
Experienced in agile methodologies, DevOps pipelines, test-driven development, and info-security practices
Able to work collaboratively with a high performance team and influence with positive energy
Resourceful and able to work out solutions with innovative thinking and new tech
Experienced with management cloud infrastructure and services / certification with GPC, GCC (i.e. AWS, Azure, Google Cloud) or equivalent cloud platforms will be preferred
Excellent problem-solving skills
Strong communication skills, with the ability to communicate complex technical issues to non-technical teams