At AIA we've started an exciting movement to create a healthier, more sustainable future for everyone.
If you believe in developing a better tomorrow, read on.
About the Role
The incumbent will be a key driver of our engineering-driven culture and be responsible for designing and implementing strategic initiatives that leverage AI and Machine Learning to create self-healing, automated, and cloud-native operational systems, providing oversight on the enterprise architecture to align with AIA architecture governance. This is a hands-on leadership role for a technically proficient individual with the vision and negotiation skills, leading the team of solution analysts and DevOps engineers to champion change, eliminate manual processes, and foster a culture of collaboration across our engineering, DevOps, and Site Reliability Engineering (SRE) teams.
Key Responsibilities
- Strategy & Leadership: Design and implement a multi-year strategy to automate and optimize IT operations using AI/ML-driven solutions, predictive analytics, and self-healing systems. Drive the cultural change towards a proactive, autonomous operations model and continuous delivery mindset. Evangelize and implement modern SRE practices across engineering teams.
- DevOps Implementation: Lead the organizational cultural transformation towards engineering-driven practices and DevOps excellence
- Process Transformation: Spearhead initiatives to eliminate traditional operations bottlenecks, automate manual processes, and establish new standards for operational efficiency and system reliability.
- Team Collaboration: Foster a collaborative and integrated environment across product engineering, DevOps, and Site Reliability Engineering (SRE) teams to ensure shared ownership and accountability for the full application lifecycle.
- Platform Modernization: Guide the evolution of our CI/CD pipelines, container orchestration on Kubernetes, and cloud-native infrastructure to support autonomous and proactive operations.
- Cultural Change: Act as a change agent within the organization, articulating the vision for an engineering-driven culture and using excellent communication and negotiation skills to build consensus and drive adoption of new methodologies.
- Hands-on Contribution: Remain deeply hands-on with the technology stack, actively contributing to architectural design, code reviews, and key technical decisions to ensure a seamless bridge between innovation and execution.
Required Technical Expertise
- A minimum of 10-15 years of deep Enterprise Architect, DevOps & SRE Experience: Extensive, hands-on experience in Enterprise Architect, DevOps and SRE principles, including CI/CD pipeline automation, infrastructure-as-code, and observability.
- AI/ML for Operations: Proven experience in designing or implementing AI/ML-driven solutions for IT operations, covering both infrastructure and application observability, such as log analysis, anomaly detection, and predictive maintenance.
- Modern Technology Stack: Strong practical experience with technologies:
- Languages: Java, NodeJs, Python
- Front-end: ReactJs
- Containerization & Orchestration: Docker and Kubernetes
- CI/CD: GitHub Actions, Bamboo, or similar tools
- Infrastructure as Code & Observability Tools: Terraform, CloudFormation, ELK, Dynatrace, Prometheus, Grafana, Datadog etc.
- Proficiency in scripting (Python, PowerShell)
- Cloud-Native Architecture: Expertise in designing and managing cloud-native systems, microservices architectures, and distributed systems.
- Experience with microservices architecture and APM/API management.
- Knowledge of security best practices and DevSecOps implementations.
- Ensure automated systems are complied with security, governance, and regulatory standards.
- Stability of system and services.
- Timely and quality deliverables.
- Good quality solution design by implementing different architecture pillars, such as security, scalability, maintainability, performance, etc.
- Architecture strategy, standards, patterns, governance, and audit reporting.
- Improvements on build automation leveraging CI/CD processes, automated testing, unit testing, code coverage and other software development best practices.
- Degree from a recognized University in Information Technology, Computer Science, Computer Engineering
- Certifications in AIOps, DevOps, AWS/Azure/GCP, ITIL, or related fields are a plus. GenAI certifications (e.g., NVIDIA, Google, Databricks) is highly desirable
Special skills
- Requires in-depth experience, knowledge and skills in own discipline
- Uses best practices and knowledge of internal/external business issues to improve products or services
- Ability to work in high-pressure environment, troubleshoot complex issues across on-prem and cloud quickly, and successfully handle multiple priorities.
- Have systematic problem-solving approach, effective communications skills and have sense of ownership and drive.
- Works independently with minimal guidance
- Manage resource and ability to perform capacity planning
- Applies best practices and knowledge of internal/external business issues to improve products or services in own discipline
- Has expertise in own discipline
- Solves moderately complex problems takes a new perspective on existing solutions
- Interprets customer needs, assesses requirements and identifies solutions to non-standard requests
- Explains information and persuades others in straightforward situations
- Makes decisions for own work priorities and allocation of time to meet deadlines
- Is accountable for technical contribution to project team or sub-team
- Builds awareness of costs related to own work
This incumbent will be reporting to CTO and manage 10 - 15 team members.