Senior AI Engineer - Large-Scale Foundation Models (LLM / VLM)

Desay SV Automotive

Singapore, Jurong East

Fresher

Save

Posted 22 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Desay SV Automotive Singapore Pte. Ltd. is an innovative organization committed to exploring frontier technologies. While the company has a strong background in automotive electronics, this role is exclusively focused on advancing applications in large language models and on-device AI inference.

Duties/ Responsibilities

On-Device Inference Engine Development. Design, develop, and optimize LLM inference engines for embedded, mobile, and edge devices — covering operator development, graph optimization, memory management, and multi-backend adaptation
Model Compression & Lightweight Deployment. Research and apply quantization (INT4/INT8/FP16), pruning, distillation, and KV Cache compression techniques to achieve efficient inference on resource-constrained hardware
Heterogeneous Hardware Optimization. Conduct operator-level performance tuning for ARM CPU, NPU, GPU, and DSP; use profiling tools to identify bottlenecks and continuously improve inference throughput and latency
LLM Inference Acceleration. Participate in building LLM inference acceleration solutions — including speculative decoding, continuous batching, and KV Cache optimization — to improve model response efficiency on edge devices
Cloud–Edge Collaboration. Collaborate on cloud AI Infra and on-device deployment pipelines: model export (ONNX/TorchScript), training–inference consistency validation, and joint cloud–edge inference architecture design
Track Frontier LLM Developments. Stay current with cutting-edge LLM research; explore feasible paths for applying the latest model capabilities (e.g., reasoning models, multimodal) to real-world embedded product scenarios

Requirement

Master's degree or above in Computer Vision, Machine Learning, Automation, or related field
C++ Proficiency (Core Requirement). Expert-level C++ with deep understanding of memory models, concurrency, and low-level optimization. Proficient in Python for model conversion, evaluation scripts, and training toolin
Cloud AI Infra or Embedded Inference Framework Experience. Hands-on experience with either: (a) large-scale GPU training cluster operations and optimization, or (b) core module development in on-device inference frameworks such as MNN, TNN, NCNN, or ExecuTorc
Large Model Algorithm Fundamentals. Solid understanding of Transformer attention mechanisms, KV Cache, continuous batching, and speculative decoding. Familiar with mainstream open-source model architectures including LLaMA, Qwen, Gemma, and Mistra
Embedded Systems & Heterogeneous Hardware. Understanding of embedded system principles and heterogeneous hardware architectures (ARM, Snapdragon, MTK, Apple Silicon). Experience with driver adaptation or BSP is a plu
Engineering Discipline. Proficient in Linux development environments; experienced with performance profiling (perf, Instruments, Snapdragon Profiler), unit testing, and CI/CD workflow