Helpdesk Support Engineer (Application Support)
We are seeking a capable and motivated Helpdesk Engineer focusing on application support to join the technical support team to work on different clientele projects. In this role, you will ensure the stability, reliability, and performance of government digital systems by investigating incidents, resolving technical issues, and collaborating with engineering and infrastructure teams.
Key Responsibilities
- Provide second-line (L2) support for production and staging systems, handling escalations from L1 support.
- Investigate application errors, system alerts, performance degradation, and integration issues.
- Restore services within agreed SLA/OLA timelines and ensure proper incident closure.
- Perform in-depth troubleshooting using logs, metrics, and monitoring tools; conduct root cause analysis (RCA) for recurring or high-impact incidents.
- Propose and implement corrective and preventive actions to reduce incident recurrence.
- Collaborate with L3 engineers, DevOps, and vendors to resolve complex technical issues.
- Participate in incident bridges, post-incident reviews, and operational discussions.
- Monitor system health, alerts, dashboards, and logs to proactively identify issues.
- Execute approved configuration changes, patches, and operational fixes.
- Support deployment, release, and maintenance activities as required.
- Contribute to the automation of operational tasks, monitoring, and alerting where applicable.
- Identify and drive improvements in runbooks, SOPs, and operational processes.
- Maintain and update runbooks, troubleshooting guides, and knowledge base articles.
- Document incident resolutions and operational procedures clearly and accurately.
- Adhere to security, access control, and compliance requirements; support audits and compliance checks when required.
Mandatory Skills & Experience/Requirements
- Diploma or higher in Computer Science, Information Technology, or a related field.
- 35+ years of relevant experience in application support, systems support, or operations roles.
- Experience supporting production systems in high-availability or mission-critical environments.
- Strong hands-on experience with application log analysis and monitoring tools (e.g., AWS CloudWatch, Grafana, ELK).
- Proficiency in Linux/Unix environments.
- Working knowledge of cloud platforms (e.g., AWS ECS, Lambda, S3, RDS).
- Basic database knowledge (MySQL, PostgreSQL) for health checks and simple queries.
- Familiarity with REST APIs, system integrations, and authentication design.
- Understanding of incident, problem, and change management processes.
- Experience with ticketing and incident management tools (e.g., Jira, PagerDuty).
- Experience working with runbooks, SOPs, and on-call support rotations.
- Strong analytical and troubleshooting skills.
- Clear verbal and written communication skills, including incident reporting.
- Ability to work collaboratively and methodically under pressure.
Technical Stack / Domain Knowledge
- Application Support: Production/staging systems, incident & problem management, SLA/OLA adherence
- Monitoring & Troubleshooting: AWS CloudWatch, Grafana, ELK, log analysis, root cause analysis
- Operating Systems: Linux/Unix
- Cloud Platforms: AWS (ECS, Lambda, S3, RDS)
- Databases: MySQL, PostgreSQL (basic queries, health checks)
- APIs & Integrations: REST, authentication, system integrations
- Automation & Scripting: Bash, Python (bonus)
- Documentation: Runbooks, SOPs, knowledge base, and incident reports
- Security & Compliance: Access control, audit support, compliance checks
- Operational Tools: Jira
Bonus Skills
- Experience supporting cloud-native or microservices-based systems.
- Knowledge of disaster recovery and business continuity planning.
- Experience in government, regulated, or large-scale enterprise environments.