Job Description
- Cloud Infrastructure Analyst is responsible for the day-to-day operations, maintenance, and support of the organization's enterprise AWS cloud environment.
- This role ensures high availability, reliability, performance, security, and compliance of cloud infrastructure while driving automation and continuous improvement.
Key Responsibilities:
- Cloud Operations & Infrastructure Management
- Manage, configure, and maintain AWS services including EC2, S3, RDS, VPC, and related components.
- Automate infrastructure provisioning and changes using Infrastructure as Code (CloudFormation/Terraform).
- Monitor and optimize cloud resources for performance, reliability, and cost efficiency.
- Monitoring, Performance & Incident Management
- Implement and manage monitoring and alerting solutions (e.g., CloudWatch, Prometheus).
- Proactively identify, troubleshoot, and resolve infrastructure incidents and service requests.
- Perform root cause analysis and support after-hours activities when required.
- Security & Compliance
- Enforce AWS security best practices including IAM, Security Groups, NACLs, and Security Hub.
- Conduct regular log reviews, account reviews, and vulnerability assessments.
- Ensure compliance with regulatory and industry standards (e.g., GDPR, HIPAA, PCI DSS).
- Automation & Continuous Improvement
- Automate operational tasks using scripts, tools, and CI/CD pipelines.
- Drive improvements in cloud operations, monitoring, and deployment processes.
- Stay current with AWS services and cloud best practices.
- Documentation & Knowledge Sharing
- Maintain clear documentation of cloud architecture, configurations, and operational procedures.
- Build and manage a knowledge base for common issues and resolutions.
- Provide guidance and support to internal teams on AWS operations.
Required Skills & Experience:
- Strong hands-on experience with AWS cloud services and operations
- Experience with IaC, monitoring tools, and automation scripting
- Solid understanding of cloud security, compliance, and incident management
- Ability to work in a 24/7 operational support environment when needed