Search by job, company or skills

Unison Group New Zealand

Lead Infrastructure Engineer (OpenShift , AWS, Docker, Terraform, Gen AI )

10-12 Years
Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 8 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Overview

We are seeking an experienced Senior GenAI Platform Engineer / OpenShift SME to lead and manage enterprise-scale infrastructure supporting GenAI applications. This role focuses on OpenShift platform engineering, hybrid cloud environments, disaster recovery (DR), and security for highly scalable and resilient AI platforms.

Requirements

  • 10+ years of experience in infrastructure engineering / platform engineering.
  • Strong expertise in managing OpenShift (OCP) in enterprise production environments.
  • Hands-on experience in infrastructure sizing, capacity planning, and performance tuning for AI workloads.
  • Experience supporting Oracle Database from an infrastructure/application standpoint.
  • Strong knowledge of certificate management, secrets handling, and key management.
  • Experience with CI/CD pipelines and infrastructure automation.
  • Solid background in security, vulnerability management, and compliance.
  • Proven experience in designing and implementing Disaster Recovery (DR) solutions.
  • Experience with AWS cloud services and hybrid cloud environments.
  • Strong experience with Docker and Kubernetes.
  • Excellent coordination and stakeholder management skills across cross-functional teams.

Key Responsibilities

  • Lead and manage end-to-end infrastructure for enterprise GenAI applications hosted on OpenShift (OCP).
  • Own capacity planning, sizing, and performance optimization of OpenShift clusters and related infrastructure components.
  • Manage and optimize infrastructure including Oracle DB, Redis, Elastic DB, PostgreSQL, Dell ECS storage, and Linux environments (RedHat/Ubuntu).
  • Design and implement Disaster Recovery (DR) strategies ensuring high availability, resilience, and business continuity.
  • Lead E2E DR setup including replication, failover, testing, and documentation in collaboration with infra and network teams.
  • Manage certificate lifecycle (TLS/SSL), secrets, and key management across platforms.
  • Implement vulnerability management, patching, and remediation across Kubernetes, containers, and infrastructure.
  • Support and coordinate penetration testing and address security findings.
  • Work with AWS services (EC2, VPC, CloudWatch, Lambda, Bedrock) in hybrid cloud environments.
  • Build and maintain infrastructure automation using Terraform and CloudFormation.
  • Manage observability using monitoring, logging, alerting tools, and Control-M schedulers.
  • Collaborate with DevOps, Security, and Development teams for platform reliability and performance.
  • (Bonus) Work with or support open-weight LLM models for AI/ML use cases.

More Info

Job Type:
Industry:
Employment Type:

Job ID: 146635821

Similar Jobs