Search by job, company or skills

D

ML Engineer / AI Platform Lead

5-7 Years
SGD 13,000 - 18,000 per month
new job description bg glownew job description bg glownew job description bg svg
  • Posted 11 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

You own the AI core: model serving, the retrieval-augmented generation (RAG) pipeline, prompt engineering, and the feedback-to-training pipeline. In Phase 1, you make the base model perform as well as possible through context engineering - system prompts, few-shot exemplars, and retrieval optimisation - without modifying model weights. You also design the custom model training workflow so that enterprise clients can train their own fine-tuned models in Phase 2. This is the highest-leverage individual contributor role on the founding team.

Responsibilities

  • Deploy and optimise a large language model for production inference: quantisation, continuous batching, low-latency serving.
  • Build the RAG pipeline: document chunking, embedding generation, vector storage, cross-encoder reranking, and context assembly optimised for a 128K-token context window.
  • Build the context layer: per-tenant system prompts, dynamically retrieved few-shot exemplars, task routing (classifying incoming requests to the right prompt configuration).
  • Build defensive output parsing: structured JSON output from an unmodified base model with graceful fallbacks.
  • Design and implement the feedback collection pipeline: capturing user corrections and ratings, automatically generating training data candidates for future fine-tuning.
  • Design the custom model training workflow: tenant-scoped LoRA training on client-specific data, model evaluation, A/B testing, and isolated deployment.
  • Monitor and improve inference quality: parsing failure rates, citation accuracy, hallucination rates, latency - all tracked per tenant.
  • Iterate on prompts daily with the domain expert during the pilot phase.

Requirements

  • 5+ years ML engineering 2+ years working with large language models in production.
  • Hands-on experience with LLM serving frameworks (vLLM, TGI, or equivalent).
  • Deep experience building RAG pipelines: chunking strategies, embedding models, vector databases, reranking.
  • Strong prompt engineering skills for production applications - you know how to make a base model produce consistent, structured, high-quality output.
  • Python: PyTorch, Transformers, FastAPI.
  • Familiar with LoRA/QLoRA fine-tuning workflows.

Nice to have

  • Experience building multi-tenant ML serving infrastructure.
  • Experience with financial or crypto AI applications.
  • Experience with cross-encoder reranking models (DeBERTa or similar).
  • Understanding of data isolation requirements for ML training pipelines.

More Info

Job Type:
Industry:
Employment Type:

Job ID: 145222135