ML Engineer / AI Platform Lead

dgtl technologies pte. ltd.

Singapore, Marina

5-7 Years

SGD 13,000 - 18,000 per month

Save

Posted 11 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

You own the AI core: model serving, the retrieval-augmented generation (RAG) pipeline, prompt engineering, and the feedback-to-training pipeline. In Phase 1, you make the base model perform as well as possible through context engineering - system prompts, few-shot exemplars, and retrieval optimisation - without modifying model weights. You also design the custom model training workflow so that enterprise clients can train their own fine-tuned models in Phase 2. This is the highest-leverage individual contributor role on the founding team.

Responsibilities

Deploy and optimise a large language model for production inference: quantisation, continuous batching, low-latency serving.
Build the RAG pipeline: document chunking, embedding generation, vector storage, cross-encoder reranking, and context assembly optimised for a 128K-token context window.
Build the context layer: per-tenant system prompts, dynamically retrieved few-shot exemplars, task routing (classifying incoming requests to the right prompt configuration).
Build defensive output parsing: structured JSON output from an unmodified base model with graceful fallbacks.
Design and implement the feedback collection pipeline: capturing user corrections and ratings, automatically generating training data candidates for future fine-tuning.
Design the custom model training workflow: tenant-scoped LoRA training on client-specific data, model evaluation, A/B testing, and isolated deployment.
Monitor and improve inference quality: parsing failure rates, citation accuracy, hallucination rates, latency - all tracked per tenant.
Iterate on prompts daily with the domain expert during the pilot phase.

Requirements

5+ years ML engineering 2+ years working with large language models in production.
Hands-on experience with LLM serving frameworks (vLLM, TGI, or equivalent).
Deep experience building RAG pipelines: chunking strategies, embedding models, vector databases, reranking.
Strong prompt engineering skills for production applications - you know how to make a base model produce consistent, structured, high-quality output.
Python: PyTorch, Transformers, FastAPI.
Familiar with LoRA/QLoRA fine-tuning workflows.