Summary
We are seeking a highly skilled Network Reliability Engineer (NRE) with strong hands-on experience across hybrid enterprise and cloud network environments, including on-premises data centers and AWS/Azure cloud platforms. The ideal candidate will be responsible for ensuring reliability, scalability, availability, automation, and observability of network and security infrastructure using Network Reliability Engineering(NRE) and Site Reliability Engineering (SRE) principles.
Responsibilities
Network Reliability and Operations
- Own end-to-end reliability, performance, and availability of hybrid and cloud-connected network services.
- Apply NRE/SRE principles to improve operational efficiency, resilience, scalability, and service stability.
- Perform advanced troubleshooting, root cause analysis (RCA), and incident resolution for complex network and security issues.
- Participate in on-call rotations, major incident response, and operational escalations.
- Define and maintain SLIs, SLOs, and error budgets for critical network services.
- Develop operational standards, documentation, SOPs, and runbooks.
Network Automation and Infrastructure as Code
- Design, develop, and maintain Ansible playbooks, roles, templates, and automation workflows for multi-vendor network environments.
- Automate Day-1, Day-2, and Day-N operational activities including provisioning, configuration management, backups, compliance validation, and remediation.
- Translate operational procedures and BAU activities into reusable Infrastructure-as-Code (IaC) automation frameworks.
- Implement event-driven and self-healing automation integrated with monitoring tools, APIs, and ITSM platforms.
- Build automated validation and configuration drift detection mechanisms.
Hybrid Cloud and Data Center Networking
- Support and manage Cisco ACI-based enterprise data center environments.
- Administer Cisco IOS-XE, NX-OS, and Wireless LAN Controller (WLC) platforms.
- Design and support AWS networking solutions including: VPCs, Transit Gateway, VPN, Direct Connect, Security Groups, NACLs
- Design and support Azure networking solutions including: VNets, NSGs, UDRs, VPN Gateway, ExpressRoute, Azure Firewall
- Manage hybrid connectivity solutions including MPLS, internet, site-to-site VPN, remote access VPN, and secure cloud connectivity.
Network Security and Proxy Platforms
- Manage and support enterprise firewall platforms including: Palo Alto, Check Point, Cisco Firepower (FTD)
- Support firewall policy governance and automation using AlgoSec.
- Administer proxy and secure web gateway platforms including Bluecoat ProxySG.
- Support cloud security platforms including Zscaler ZIA, ZPA, and ZTNA solutions.
Application Delivery and Load Balancing
- Manage and support F5 VELOS chassis environments.
- Configure and administer F5 LTM, GTM, and APM modules.
- Design resilient application delivery solutions for on-premises and cloud workloads.
Monitoring, Observability and CI/CD
- Build and maintain observability dashboards, monitoring, and alerting using: Datadog, Prometheus, Grafana, Cisco DNA Center, Forescout
- Integrate infrastructure changes into CI/CD pipelines with automated testing, validation, and rollback capabilities.
- Utilize GitHub for version control, peer reviews, and change governance.
- Develop automation tools and integrations using REST APIs, Python, and Django.
Requirements
- Bachelor's degree in Computer Science, Information Technology, Engineering, or equivalent experience.
- 7 years of experience in Network Engineering, Cloud Networking, Network Security, NRE, or SRE roles.
- Strong expertise in enterprise networking, cloud networking, and network security.
- Strong hands-on experience with network automation using Ansible.
- Experience with CI/CD pipelines and Infrastructure-as-Code methodologies.
- Strong analytical, troubleshooting, and problem-solving skills.
- Ability to work effectively in large-scale production environments.
- Strong communication, collaboration, and documentation skills.
- Ability to translate business requirements into measurable reliability objectives.
- Relevant certifications are highly preferred: CCNP / CCIE, AWS or Azure Networking certifications, PCNSE, Check Point CCSA/CCSE, F5 certifications