Search by job, company or skills

Desay SV Automotive

Senior AI Engineer - Large-Scale Foundation Models (LLM / VLM)

Save
  • Posted 22 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Desay SV Automotive Singapore Pte. Ltd. is an innovative organization committed to exploring frontier technologies. While the company has a strong background in automotive electronics, this role is exclusively focused on advancing applications in large language models and on-device AI inference.

Duties/ Responsibilities

  • On-Device Inference Engine Development. Design, develop, and optimize LLM inference engines for embedded, mobile, and edge devices — covering operator development, graph optimization, memory management, and multi-backend adaptation
  • Model Compression & Lightweight Deployment. Research and apply quantization (INT4/INT8/FP16), pruning, distillation, and KV Cache compression techniques to achieve efficient inference on resource-constrained hardware
  • Heterogeneous Hardware Optimization. Conduct operator-level performance tuning for ARM CPU, NPU, GPU, and DSP; use profiling tools to identify bottlenecks and continuously improve inference throughput and latency
  • LLM Inference Acceleration. Participate in building LLM inference acceleration solutions — including speculative decoding, continuous batching, and KV Cache optimization — to improve model response efficiency on edge devices
  • Cloud–Edge Collaboration. Collaborate on cloud AI Infra and on-device deployment pipelines: model export (ONNX/TorchScript), training–inference consistency validation, and joint cloud–edge inference architecture design
  • Track Frontier LLM Developments. Stay current with cutting-edge LLM research; explore feasible paths for applying the latest model capabilities (e.g., reasoning models, multimodal) to real-world embedded product scenarios

Requirement

  • Master's degree or above in Computer Vision, Machine Learning, Automation, or related field
  • C++ Proficiency (Core Requirement). Expert-level C++ with deep understanding of memory models, concurrency, and low-level optimization. Proficient in Python for model conversion, evaluation scripts, and training toolin
  • Cloud AI Infra or Embedded Inference Framework Experience. Hands-on experience with either: (a) large-scale GPU training cluster operations and optimization, or (b) core module development in on-device inference frameworks such as MNN, TNN, NCNN, or ExecuTorc
  • Large Model Algorithm Fundamentals. Solid understanding of Transformer attention mechanisms, KV Cache, continuous batching, and speculative decoding. Familiar with mainstream open-source model architectures including LLaMA, Qwen, Gemma, and Mistra
  • Embedded Systems & Heterogeneous Hardware. Understanding of embedded system principles and heterogeneous hardware architectures (ARM, Snapdragon, MTK, Apple Silicon). Experience with driver adaptation or BSP is a plu
  • Engineering Discipline. Proficient in Linux development environments; experienced with performance profiling (perf, Instruments, Snapdragon Profiler), unit testing, and CI/CD workflow

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 149776113