Cloud Linux Engineer (Level 2) with strong hands-on Linux/Unix administration experience and multi-cloud operational expertise to support the cloud infrastructure. This role requires at least 2 years of Linux/Unix experience and minimum of 1 year of cloud experience, predominantly on Amazon Web Services (AWS), with exposure to Microsoft Azure and Google Cloud Platform (GCP). The position focuses on maintaining and supporting Linux workloads across multi-cloud environments, OS patching, ITIL processes, application deployment and troubleshooting, security hardening, and operational excellence in regulated network separation environments.
Key Responsibilities:
Multi-Cloud Linux Operations
- Provide L2 operational support for Linux distributions (RHEL 7/8/9, CentOS, Ubuntu, Amazon Linux 2/2023) in on-premises and multi-cloud environments
- Support cloud operations predominantly on Amazon Web Services, Microsoft Azure and Google Cloud Platform
- Hands-on experience with cloud services including: EC2, S3, IAM, EBS, CloudWatch, Systems Manager (SSM), AWS Backup, Security Groups, VPC, Lambda, ECS, EKS, Azure Virtual Machines, Azure Kubernetes Service (AKS), Azure Storage, Azure Monitor, Azure Automation, Azure Backup, Compute Engine, Google Kubernetes Engine (GKE), Cloud Storage, Cloud Monitoring, Cloud IAM
- Manage and support Linux system services, systemd, networking, firewalls (iptables/firewalld), SELinux/AppArmor
- Monitor and maintain Linux workload performance, availability, and ensure cloud security baseline across cloud platforms
- Participate in 24/7 shift rotation to provide round-the-clock operational support
Operating System Patch Management
- Perform comprehensive OS patching for Linux environments using YUM/DNF, APT, AWS Systems Manager, and Azure Update Management
- Execute monthly and quarterly patch cycles with coordination and approval workflows
- Maintain kernel patching and updates while ensuring system stability and uptime
- Deep expertise in Linux Operating System patching, including pre-patch validation, deployment, and post-patch verification
- Basic knowledge of Windows Server patching using WSUS and cloud-native tools
- Track patch compliance and generate reports for audit and compliance purposes
- Coordinate patch windows and communicate with stakeholders
Application Deployment & Troubleshooting
- Deploy and configure applications on Linux operating systems
- Troubleshoot application issues at the OS level, including permissions, services, dependencies, and performance
- Support application teams with OS-level diagnostics and resolution
- Perform application log analysis using tools like grep, awk, sed, and centralized logging platforms
- Configure and manage web servers (Apache, Nginx) and application servers (Tomcat, JBoss)
- Collaborate with development teams to resolve infrastructure-related application problems
ITIL & Service Management
- Resolve incidents and service requests related to Linux systems via ITSM platforms
- Follow ITIL processes for Incident, Problem, Change, and Request Management
- Create and update tickets with detailed documentation and resolution steps
- Escalate complex issues to Level 3 engineers and track resolution progress
- Participate in Change Advisory Board (CAB) reviews and change implementations
- Maintain SLAs and ensure timely ticket resolution
Security & Compliance
- Execute CIS (Center for Internet Security) security remediations and hardening baselines for Linux systems
- Implement and review IAM permissions using IAM Access Analyzer, Azure RBAC, and GCP IAM with least privilege model
- Perform Vulnerability Management System (VMS) remediation based on scan findings
- Execute Cloudscape recommendations in collaboration with InfoSec teams
- Work on Security threat detection tools and perform remediation
- Configure and maintain SELinux/AppArmor security policies
- Implement SSH hardening, sudo policies, and access controls
- Support security compliance scanning and remediation activities
- Maintain security configurations and monitor for security alerts
- Implement and maintain SSL/TLS certificate management and renewal processes
Container & DevSecOps
- Demonstrate working knowledge of container technologies (Docker, Kubernetes, ECS, EKS, AKS, GKE)
- Support containerized Linux applications and microservices architectures
- Familiarity with DevSecOps practices and tools used in Singapore Government technology stack (SHIP-HATS)
- Understand CI/CD pipeline concepts and security integration
- Basic knowledge of container orchestration and pod management
Automation & Scripting
- Develop and maintain Bash/Shell scripts for routine tasks, automation, and remediation
- Proficiency in Python scripting for automation and system administration tasks
- Utilize AWS CLI, Azure CLI, and gcloud CLI for cloud operations
- Create and execute SSM Documents and Azure Automation Runbooks for automated remediation
- Experience with configuration management tools (Ansible preferred)
- Automate repetitive operational tasks to improve efficiency
Backup & Disaster Recovery
- Implement and maintain backup and recovery strategies for Linux servers in cloud environments
- Perform backup validations and participate in disaster recovery testing
- Experience with backup tools (rsync, tar, AWS Backup, Azure Backup, snapshots)
- Support business continuity planning activities
- Document and test recovery procedures
Documentation & Knowledge Management
- Create and maintain technical documentation, knowledge articles, and standard operating procedures (SOPs)
- Document troubleshooting steps, configurations, and remediation procedures
- Maintain runbooks for common operational tasks
- Contribute to team knowledge base and continuous improvement initiatives
Monitoring & Observability
- Configure and maintain monitoring using native cloud - CloudWatch, Azure Monitor, and GCP Cloud Monitoring
- Set up alerts, alarms, and notifications for critical systems
- Experience with monitoring tools (Prometheus, Grafana, ELK Stack, Splunk)
- Analyze logs and metrics to identify and resolve issues proactively
- Support integration with centralized monitoring and observability platforms
- Configure log aggregation and analysis pipelines
Audit & Compliance Support
- Participate in internal and external audits
- Provide evidence and documentation for compliance requirements
- Support audit remediation activities
- Maintain compliance with government security frameworks and standards