Search by job, company or skills

Nava

AI Infrastructure Architect

Fresher
Save
  • Posted 16 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Role & Responsibilities

  • Architect and deploy scalable AI infrastructure for LLMs and multi-modal models—covering data pipelines, model serving, and inference optimization on Kubernetes/AKS/EKS.
  • Design GPU/NPU-aware cluster topologies with auto-scaling, model checkpointing, and low-latency inference SLAs for production workloads.
  • Integrate MLOps toolchains (MLflow, Weights & Biases, KServe, Triton) with CI/CD pipelines to automate model deployment, rollback, and drift detection.
  • Establish infrastructure-as-code (IaC) standards using Terraform or Pulumi for reproducible, secure, and auditable cloud environments.
  • Collaborate with ML Engineers and Data Scientists to optimize model quantization, batching strategies, and memory footprint for production efficiency.
  • Define SRE practices—observability (Prometheus/Grafana), alerting, disaster recovery—and enforce infrastructure security (IAM, network policies, pod security policies).

Skills & Qualifications

Must-Have

  • Kubernetes
  • Terraform
  • GPU orchestration
  • Model serving (Triton, KServe, Seldon)
  • MLflow
  • Prometheus
  • Grafana
  • AKS/EKS

Preferred

  • Ray Serve
  • ONNX Runtime
  • OpenTelemetry

Benefits & Culture Highlights

  • Work on bleeding-edge AI infrastructure used by Fortune 500 clients and scaling AI startups.
  • On-site collaborative environment in Singapore's innovation hub with cross-functional AI & cloud teams.
  • Unlimited PTO, performance bonuses, and annual learning stipend for certifications and AI conferences.

Skills: networking,infrastructure,teams,architecture,platforms,kubernetes,design,storage,data center,orchestration,cloud

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 150599165