Job Description:-
- Design, build, and maintain AWS-based infrastructure using Infrastructure as Code (IaC) across 20 AWS accounts spanning production and non‑production environments.
- Ensure high availability, fault tolerance, scalability, and disaster recovery across environments.
- Implement Cloud Governance best practices (account structure, guardrails, tagging, IAM boundaries) in line with established standards and policies.
- Partner with security teams to support regulatory and compliance requirements (e.g. MAS TRM, SOC, ISO).
- Manage vulnerability remediation, patching, and secure configuration baselines.
- Build, enhance, and maintain CI/CD pipelines for product applications and IaC.
- Reduce manual operations through development of internal tooling and automation to improve deployment reliability and operational efficiency.
- Implement and maintain monitoring, logging, and alerting using tools such as CloudWatch, or third-party APM solutions.
- Support gradual adoption of SRE practices such as reliability-focused metrics, actionable alerts and post-incident learning and continuous improvement
- Participate in business-hours operations, with 24/7 standby for critical incidents and actively respond to production incidents affecting availability, performance, or security.
- Perform formal post-incident reviews and root cause analyses (RCAs) in compliance with MAS TRM requirements.
- Perform cloud architecture and deployment design reviews for third‑party applications.
- Enable vendors to deploy applications within company AWS accounts while retaining full control of AWS infrastructure changes and ensuring compliance with security and operational standards.
- Support audits by providing evidence, explanations, and remediation actions.
Educational Qualifications:-
Bachelor's Degree in IT/Computer Science/Computer Engineering or relevant discipline
Professional Skillsets:-
1. Strong hands-on experience operating AWS in production.
2. Mandatory experience with Terraform for infrastructure management.
3. Experience with EC2, EKS, AWS networking (VPC, ALB/NLB), and IAM.
4. Strong Linux and troubleshooting fundamentals.
5. Proficiency in at least one scripting/programming language (e.g. Python, Bash) for automation.