We are seeking a capable and motivated Level 2 (L2) Application Support Engineer to join our operations and support team. You will play a critical role in ensuring the stability, reliability, and performance of our production systems by investigating incidents, resolving complex technical issues, and working closely with engineering and infrastructure teams. You will be part of a cross-functional team that values service reliability, operational excellence, automation, and continuous improvement in the systems we support.
Job Description:
- Provide second-line (L2) support for production and staging systems, handling escalations from L1 Support.
- Investigate application errors, system alerts, performance degradation, and integration issues.
- Restore services within agreed SLA/OLA timelines and ensure proper incident closure.
- Perform in-depth troubleshooting using logs, metrics, and monitoring tools.
- Conduct root cause analysis (RCA) for recurring or high-impact incidents.
- Propose and implement corrective and preventive actions to reduce incident recurrence.
- Work closely with L3 engineers, DevOps, and vendors to resolve complex technical issues.
- Provide clear technical findings, logs, and evidence when escalating issues.
- Participate in incident bridges, post-incident reviews, and operational discussions.
- Contribute to automation of operational tasks, monitoring, and alerting where applicable.
- Identify gaps in runbooks, SOPs, and operational processes and drive improvements.
Technical Expertise:
- 3-5+ years of relevant experience in application support, systems support, or operations roles.
- Experience supporting production systems in a high-availability or mission-critical environment.
- Strong hands-on experience with: Application log analysis and monitoring tools (e.g. AWS CloudWatch, Grafana, ELK, Google Analytics, etc) and Linux/Unix environments
- Working knowledge of cloud platforms (e.g. AWS services such as ECS, Lambda, S3, RDS).
- Basic database knowledge (MySQL, PostgreSQL) for health checks and simple queries.
- Basic knowledge on REST APIs, system integrations and authentication design
- Understanding of incident, problem, and change management processes.
- Familiarity with ticketing and incident management tools (e.g. Jira, PagerDuty).
- Experience working with runbooks, SOPs, and on-call support rotations (if applicable).
Additional Skills (Bonus Points):
- Experience supporting cloud-native or microservices-based systems.
- Basic scripting skills (e.g. Bash, Python) for automation.
- Experience working in government, regulated, or large-scale enterprise environments.
- Knowledge of disaster recovery and business continuity planning.