
Search by job, company or skills
Cluster Lifecycle Engineering: Design and automate the full lifecycle of Red Hat OpenShift clusters across multi-cloud environments (AWS, Azure, GCP), ensuring they are treated as immutable infrastructure.
Automation Development: Build robust, reusable automation for cluster provisioning, version upgrades, and auto-scaling using Ansible, Terraform, and Python.
Advanced Component Management: Implement and fine-tune critical OpenShift components, specifically OVN-Kubernetes for networking and OpenShift Data Foundation (ODF)/Ceph for software-defined storage.
GitOps & Orchestration: Establish and maintain GitOps workflows (e.g., using ArgoCD or Flux) to ensure the Source of Truth for all cluster configurations remains in Git.
Reliability & Performance: Conduct deep-dive performance tuning and engineer high-availability (HA) solutions to ensure the container platform meets strict government Service Level Agreements (SLAs).
Security & Compliance: Implement Policy as Code to ensure all clusters automatically adhere to IM8 security standards and CIS benchmarks.
Education: Bachelor's Degree in Computer Science, Computer Engineering, or a related technical field.
Experience: Minimum 3+ years of hands-on experience in Platform Engineering, DevOps, or Site Reliability Engineering (SRE).
Technical Expertise:
Strong mastery of Red Hat OpenShift (OCP 4.x) and Kubernetes internals (API, etcd, Scheduler).
Proficiency in Infrastructure as Code (IaC) tools, specifically Terraform and Ansible.
Strong scripting skills in Python or Bash for building custom automation tools.
Knowledge of container networking (SDN, OVN) and storage orchestration (CSI, Ceph/ODF).
Red Hat Certified Specialist in OpenShift Administration/Automation.
Experience with Cloud Native monitoring stacks (Prometheus, Grafana, ELK).
Prior experience working within the GCC 2.0 ecosystem or GovTech stacks.
Job ID: 144599069