Search by job, company or skills

K

Infrastructure Engineer (GPU / Kubernetes / Distributed Systems)

4-7 Years
SGD 12,000 - 24,000 per month
Save
  • Posted 19 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

We're working with a high-growth AI infrastructure company building foundational systems powering next-generation AI products and intelligent search infrastructure.

The team is building a search engine designed for AI agents - operating large-scale distributed systems that crawl the web, train state-of-the-art embedding models, and power high-performance vector search infrastructure. On the compute side, they operate a rapidly growing multi-million-dollar H200 GPU cluster alongside large-scale distributed batch processing systems running across tens of thousands of machines.

This is a deeply technical infrastructure role focused on building the internal platform and tooling that enables the entire engineering organization to move fast at scale.

What You'll Work On

  • Build and scale Kubernetes orchestration for large GPU clusters
  • Design distributed infrastructure powering large-scale AI workloads
  • Scale cloud batch job systems handling map-reduce workloads across tens of thousands of machines
  • Improve GPU scheduling and cluster utilization efficiency
  • Build observability, reliability, and internal platform tooling for production systems
  • Work on infrastructure supporting AI training, inference, crawling, and data processing at massive scale

What We're Looking For

  • Experience designing and operating large-scale infrastructure systems
  • Strong hands-on experience with Kubernetes in production environments
  • Familiarity with GPU clusters, distributed compute, or cloud batch processing systems
  • Strong understanding of observability, reliability engineering, and system optimization
  • Experience with distributed systems and performance-oriented infrastructure
  • Background in high-performance engineering environments is highly valued

Nice to Have

  • Experience with Ray, distributed batch systems, or large-scale orchestration platforms
  • Experience optimizing GPU utilization and scheduling
  • Familiarity with AWS infrastructure at scale
  • Exposure to AI/ML infrastructure environments

Why This Role

  • Work on infrastructure problems typically seen only at hyperscale AI companies
  • Join a highly technical, low-ego engineering culture
  • Opportunity to shape foundational systems from an early stage
  • High ownership and ability to work on deeply challenging engineering problems
  • Competitive compensation with meaningful equity upside

More Info

Job Type:
Industry:
Employment Type:

Job ID: 147916997