LLM Optimization Engineer

Fresher

SGD 6,000 - 9,000 per month

Save

Early Applicant

Job Description

Design and implement efficient parallel computing strategies and memory management mechanisms to improve end-to-end throughput and latency
Develop and optimize high-performance training and inference frameworks, maximizing hardware compute and memory bandwidth utilization

Proficiency in Python and C++, with strong foundations in data structures, algorithms, and systems programming
Solid experience with PyTorch, including a deep understanding of model execution workflows, operator invocation, and computation graph mechanisms
Familiarity with high-performance computing (HPC) concepts such as parallel computing, memory hierarchy, and operator fusion
Basic understanding of accelerator architectures (e.g., GPU, NPU), including compute units, memory systems, and communication mechanisms

Experience with mainstream LLM inference acceleration frameworks such as vLLM and SGLang, with hands-on performance optimization experience
Familiarity with techniques such as KV cache optimization, attention optimization, operator fusion, and low-precision computation (e.g., FP8, FP4)
Experience in productionizing large model training or inference systems, with end-to-end performance optimization experience