
Search by job, company or skills

STAFF / SENIOR ENGINEER, AI SYSTEMS & LLM INFERENCE OPTIMIZATION | AI INFRA PLATFORM
Our Client is a leading AI technology company building next-generation infrastructure for large-scale foundation models. With a focus on model efficiency, high-performance inference, and scalable AI systems, the company is advancing cutting-edge capabilities in LLM deployment and AI infrastructure.
Reporting to senior technical leadership, this role focuses on LLM inference optimization, model efficiency, and system performance. You will work closely with research and engineering teams to improve latency, throughput, and cost efficiency across large-scale AI systems.
This position supports two profiles:
(1) Computer Architecture (hardware–software co-optimization), or
(2) Software / AI Systems (LLM inference, model efficiency, AI infra).
Key areas include inference optimization (quantization, sparsity, KV cache, batching), high-performance kernels (e.g., GEMM, attention), and system-level optimization across runtime, compiler, and distributed environments.
The ideal candidate holds a Master's or Ph.D. in a relevant field, with strong programming skills (C++/CUDA/Python) and hands-on experience in deep learning systems or large model inference. Experience with frameworks such as PyTorch, TensorRT, or vLLM is advantageous.
For expressions of interest, please forward your resume to Kina at [Confidential Information]. Pan & Company Pte Ltd | Licence R1875384 | EA Licence No. 18S9074
Job ID: 146342059