Job Scope
- Own day-to-day operations of the Azure environment and mixed Windows/Linux workloads.
- Keep services available, secure, patched, monitored, cost-efficient, and well-documentedautomating wherever possible.
Key Responsibilities
- Azure operations: Manage IaaS/PaaS including VMs/VMSS, Storage, VNets/NSGs, Load Balancer/App Gateway (WAF), Private Endpoints, Key Vault, Azure Backup, Azure Update Manager/Automation, Defender for Cloud, and Azure Monitor/Log Analytics.
- OS administration: Maintain Windows Server (AD basics, DNS, IIS, SMB, scheduling) and Linux (Ubuntu/RHEL, systems, SSH, file systems, package management), including hardening and performance tuning.
- Incident/SRE: Respond to alerts, restore service, lead RCAs/post-mortems, improve MTTR through better runbooks, automation, and observability (dashboards, alerting, KQL queries).
- Change & release: Plan and execute changes with rollback plans; maintain runbooks; follow ITIL change control and configuration management.
- Patching & vulnerability management: Drive monthly patch cycles (WSUS/Azure Update Manager/Ansible), remediate Defender/CIS findings, track compliance KPIs.
- Certificates & secrets: Operate certificate inventory/renewals (Key Vault/ACME/App Gateway/IIS/nginx), rotate secrets/keys, enforce RBAC/least privilege.
- Backup/DR: Ensure backup success, test restores, maintain RPO/RTO.
- Automation & IaC: Create/maintain PowerShell/Bash scripts, Terraform modules, and Ansible playbooks to standardize builds, patching, and operations tasks.
- FinOps & capacity: Tagging/chargeback, budgets/alerts, rightsizing, storage lifecycle policies, and cost dashboards.
- Documentation & enablement: Keep diagrams, SOPs, and runbooks current; mentor L1/L2; partner with Security, Networking, and App teams.
Job Qualifications
Required Skills & Experience
- 35 years in Cloud/Systems Operations or Site Reliability Engineering, with 2+ years hands-on Azure in production.
- Good in Windows Server and Linux administration.
- Solid networking fundamentals (TCP/IP, DNS, TLS, routing, load balancing, firewall rules).
- Proficient in PowerShell and Bash; working knowledge of Terraform and Ansible.
- Experience with Azure Monitor/Log Analytics (KQL), alerting, and dashboarding.
- Familiar with Identity & Access (Entra ID/Azure AD, RBAC, PIM), and secrets mgmt (Key Vault).
- ITIL-aware (incident/problem/change) and comfortable with on-call rotations.
- Maintenance and enhancement of CI/CD pipelines in Azure DevOps / GitHub.
Nice to Have Experience
- AKS/Kubernetes basics; containers (Docker).
- Web platforms (IIS, nginx, Apache) and App Gateway/WAF.
- Microsoft Sentinel, Defender for Cloud advanced usage.
- Configuration management at scale (Desired State/Ansible).
- Exposure to compliance frameworks (CIS, ISO 27001), Zero Trust.
- Scripting for APIs and JSON/YAML templating.
Certifications (preferably)
- Microsoft AZ-104 (Admin), AZ-900; bonus: AZ-140, AZ-305.
- RHCSA/LFCS (Linux) or equivalent.
- ITIL Foundation.
Additional Tools/Technologies
- Azure Portal/CLI, PowerShell, Bash, Terraform, Ansible, Git/GitHub or Azure DevOps, Azure Monitor/Log Analytics (KQL), Defender for Cloud, Sentinel (nice), WSUS/Azure Update Manager, Key Vault, App Gateway/WAF, ASR/Backup, ServiceNow or Jira.
Working Model
- Hybrid with scheduled on-call (rotational) e.g., 1 week in 4 weeks.
- Occasional maintenance windows (after-hours).
- Working on-site.