Key Responsibilities:
- Lead the design, deployment, administration, and lifecycle management of the OpenShift Container Platform (OCP), ensuring high availability, scalability, security, and performance.
- Plan and manage infrastructure deliverables related to OCP clusters, including upgrades, patches, platform enhancements, and capacity planning.
- Collaborate with application teams, DevOps engineers, and security teams to onboard workloads, manage platform dependencies, and ensure smooth application deployment on OpenShift.
- Oversee day to day operations of OpenShift infrastructure, including cluster health monitoring, performance tuning, logging, monitoring, and incident resolution.
- Drive automation initiatives for infrastructure provisioning, CI/CD pipeline integration, configuration management, and container orchestration best practices.
- Partner with Enterprise Architects, Security Architects, Platform Engineers, and Cloud Teams to ensure compliance with organizational architecture principles, governance, and security standards.
- Develop and maintain technical documentation related to platform architecture, operational procedures, runbooks, and disaster recovery plans.
- Support and manage platform-related testing activities (e.g., performance testing, failover testing, security testing) to ensure environment readiness.
- Provide timely escalation and resolutions for infrastructure risks and issues that may impact system uptime, project timelines, or production stability.
- Lead platform-related technical incident investigations and root cause analysis implement preventive measures.
- Mentor and guide junior engineers in container platform operations, automation, and DevOps practices.
- Coordinate infrastructure implementation activities, ensuring smooth production cutover and post implementation support for platform changes
Requirements:
Technical Skills & Experience
- Strong hands-on expertise with Red Hat OpenShift Container Platform (OCP), Kubernetes, and container orchestration technologies.
- Solid understanding of Linux systems administration (preferably Red Hat Enterprise Linux).
- Experience in managing OpenShift clusters across onpremise, hybrid cloud, or public cloud environments (e.g., AWS, Azure, GCP).
- Strong knowledge of CI/CD pipelines, DevOps tooling, and automation frameworks (e.g., Ansible, Jenkins, GitLab CI, ArgoCD).
- Experience with infrastructure-as-code technologies such as Terraform or Ansible.
- Proficiency in network concepts related to container platforms (service mesh, ingress/egress, routing, load balancing).
- Experience in implementing platform security controls (RBAC, secrets management, vulnerability scanning, audit logging).
- Familiarity with logging and monitoring tools (e.g., Prometheus, Grafana, ELK/EFK stacks, Splunk).
- Experience in performing OpenShift cluster upgrades, capacity planning, and performance optimization.
- Background in infrastructure operations with exposure to virtualization, storage, and enterprise networking.
- Proven ability to troubleshoot complex platform issues across compute, network, security, and application layers.
- Strong analytical, problemsolving, and systems thinking skills.
Leadership & Soft Skills
- Minimum 10-15 years of IT infrastructure experience, with at least 5 years focused on container platforms or cloud infrastructure.
- Experience leading infrastructure teams or platform engineering teams in complex enterprise environments.
- Strong stakeholder management skills, especially when engaging application teams, security, cloud teams, and senior management.
- Strong communication skills with both technical and nontechnical stakeholders.
- Resultsdriven, proactive, and able to manage concurrent initiatives in fastpaced environments.
Education & Certifications
- Bachelor's degree in Computer Science, Information Systems, Engineering, or related discipline.
- Red Hat certifications (e.g., RHCSA, RHCE, OpenShift Administrator/Architect) are highly advantageous.
- Certifications in cloud platforms (AWS/Azure/GCP) are preferred.
- Training or certifications in DevOps, Kubernetes, or SRE practices are beneficial.