Search by job, company or skills

H

LLM Optimization Engineer

Fresher
SGD 6,000 - 9,000 per month
Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 19 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Responsibilities

  • Design and implement efficient parallel computing strategies and memory management mechanisms to improve end-to-end throughput and latency
  • Develop and optimize high-performance training and inference frameworks, maximizing hardware compute and memory bandwidth utilization

Qualifications

  • Proficiency in Python and C++, with strong foundations in data structures, algorithms, and systems programming
  • Solid experience with PyTorch, including a deep understanding of model execution workflows, operator invocation, and computation graph mechanisms
  • Familiarity with high-performance computing (HPC) concepts such as parallel computing, memory hierarchy, and operator fusion
  • Basic understanding of accelerator architectures (e.g., GPU, NPU), including compute units, memory systems, and communication mechanisms

Preferred Qualifications

  • Experience with mainstream LLM inference acceleration frameworks such as vLLM and SGLang, with hands-on performance optimization experience
  • Familiarity with techniques such as KV cache optimization, attention optimization, operator fusion, and low-precision computation (e.g., FP8, FP4)
  • Experience in productionizing large model training or inference systems, with end-to-end performance optimization experience

More Info

Job Type:
Industry:
Employment Type:

Job ID: 147011793