
Search by job, company or skills
Responsibilities:
Design, build, and operate cloud infrastructure across development, staging, and production environments covering compute, storage, networking, containerisation, virtualisation, DNS, and monitoring. Manage AWS services including EC2, ECS, S3, RDS (PostgreSQL, MS SQL), Lambda, CloudFormation, CloudWatch, IAM, VPC, Docker/Kubernetes, and VMware vSphere and Hyper-V platforms.
Implement monitoring and observability using AWS CloudWatch (logs, alarms, Canaries), StackOps, Prometheus, Grafana, and ELK to enable proactive performance and reliability management.
Ensure compliance with Whole-of-Government standards and security controls through access management, hardening, and monitoring tools such as CyberArk.
Develop Infrastructure as Code (IaC) using Terraform, Ansible, and AWS CloudFormation to support automated, version-controlled deployments.
Apply Site Reliability Engineering (SRE) practices including toil reduction, SLO, SLI, and error budget tracking.
Manage networking (TCP/IP, DNS, DHCP, VPN, routing), platform patching automation, backup and disaster recovery, multi-availability zone setups, AWS Fault Injection Simulator (FIS) testing, and container orchestration.
Collaborate with application teams and maintain documentation, runbooks, and operational procedures.
Technical Expertise:
. Advanced experience with enterprise virtualisation platforms (VMware vSphere, Hyper-V)
. Proficiency in Linux and Windows Server administration
. Expertise in server monitoring tool installation and regular patching of virtual and physical servers
. Comprehensive health check capabilities for servers, storage, and virtualisation platforms
. Strong experience with infrastructure automation tools (Ansible, Puppet, Chef)
. Proficiency with container technologies (ECS, Docker, Kubernetes)
. Experience with monitoring and observability platforms
. Infrastructure as Code expertise (Terraform, AWS CloudFormation, Ansible)
. Solid understanding of networking concepts and technologies
. Scripting capabilities in Python, PowerShell, Bash, and Node.js
. Experience with high-availability and disaster recovery solutions including AWS FIS
. Proficiency with GitHub tools and CI/CD pipeline setup and workflow management
Job ID: 143009529