Make an impact by:
- Design Kubernetes cluster architecture
- Define deployment models for:
- Red Hat OpenShift (on-prem)
- Amazon EKS, Azure AKS, Google Kubernetes Engine (cloud)
- Deploy and configure Kubernetes/OpenShift clusters
- Implement control plane and worker node setup
- Develop Infrastructure as Code (IaC) for cluster provisioning
- Build reusable templates / golden images for clusters
- Implement GitOps pipelines for deployment
- Standardize Dev/UAT/Prod environments
- Deploy monitoring and logging stack (Prometheus, Grafana, ELK)
- Define baseline alerts, thresholds, and dashboards
- Integrate with enterprise monitoring / ticketing systems
- Provide 24×7 standby support, as required.
Skills for Success:
- Diploma/ bachelor's degree in computer science, Information Technology, Engineering, or a related discipline.
- Minimal 3-5 years of working experience with Kubernetes (K8s) architecture and operations.
- Strong hands-on experience with virtualization, container platform and private cloud/Public technologies.
- Practical experience with monitoring, performance management, and capacity planning for infrastructure.
- Strong background in incident management, including troubleshooting complex issues, performing root cause analysis, and implementing permanent corrective actions.
Platform Build & Engineering (Day-1)
- Design, deploy, and configure Kubernetes clusters (on-prem and cloud-based)
- Implement container platforms using OpenShift or managed Kubernetes (EKS, AKS, GKE)
- Define cluster architecture, networking (CNI), ingress/egress, and storage integration
- Automate infrastructure provisioning using IaC tools (e.g., Terraform, Ansible)
Operations & Support (Day-2)
- Perform cluster administration, monitoring, and performance tuning
- Manage upgrades, patching, and lifecycle of Kubernetes/OpenShift environments
- Troubleshoot cluster, pod, networking, and application issues
- Ensure platform availability, resiliency, and backup/recovery readiness
Observability & Automation
- Implement monitoring, logging, and alerting (e.g., Prometheus, Grafana, ELK)
- Build CI/CD pipelines integrating container platforms
- Automate operational tasks and standardize runbooks/SOPs