
Search by job, company or skills
Job Description
AI Infrastructure Specialist
About the Role
We are building a next-generation Enterprise AI Platform to power large-scale AI workloads across hybrid environments - and we're looking for an AI Infrastructure Specialist to lead and scale our AI computing ecosystem.
This role will oversee AI workloads across:
. On-prem GPU clusters
. Public cloud platforms
. Private AI cloud environments
. Edge / distributed sites
Key Responsibilities
. Design and manage hybrid AI infrastructure (GPU clusters, Kubernetes, private & public cloud).
. Optimize high-performance compute environments (NVIDIA GPUs, CUDA, NVLink, AI accelerators).
. Enable scalable ML training and inference platforms.
. Implement containerized AI environments and orchestration.
. Support MLOps pipelines and model lifecycle management.
. Enable CI/CD for ML deployments.
. Integrate model registry, artifact storage, and observability tools.
. Manage vector databases and inference endpoints.
. Ensure secure, resilient, and compliant AI infrastructure.
. Drive cost optimization, performance tuning, and capacity planning.
. Support AI use cases such as Generative AI, Vision AI, RAG pipelines, and predictive analytics.
Requirements
. 3+ years in Infrastructure, cloud engineering.
. Hands-on experience with GPU clusters and Kubernetes.
. Strong knowledge of containerization and orchestration platforms.
. Experience operating in hybrid or multi-cloud environments.
. Automation & Infrastructure-as-Code mindset.
. Strong troubleshooting and systems thinking skills.
. Passion for enabling enterprise AI at scale.
Job ID: 144533027