AI Inference & Compression Engineer

persol apac

Singapore

Fresher

Save

Posted 15 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

About the company:

We have partnered with a renowned global leader in information and communications technology (ICT) infrastructure and smart devices. They are providing full-stack, all-scenario solution for products and services carriers, enterprises, governments, and individual consumers worldwide.

Our client is looking for an AI Inference & Compression Engineer to join the team.

Job Overview:

This role focuses on developing high-performance compression and inference techniques across both classical video/media codecs and modern Large Language Model (LLM) inference systems. You will design intelligent pipelines that deliver higher visual quality at lower bitrates, while simultaneously developing algorithms to reduce memory footprint and computational bottlenecks in generative AI serving.

Key Responsibilities

LLM Inference Acceleration. Research and develop advanced compression algorithms to accelerate LLM serving. Focus on KV cache optimization, model quantization, and resolving memory bandwidth bottlenecks during autoregressive decoding.
Classical Codec Development. Design and implement advanced video compression algorithms, focusing on improving Rate–Distortion (RD) performance, optimizing entropy coding, and enhancing quantization design for real-world applications.
AI-Based Media Coding. Develop and optimize AI-based video coding components, including AI-based loop filters, optical flow, and intelligent rate control.
Model Deployment & Fusion. Bridge the gap between AI research and production. Optimize deep learning models for efficient inference and ensure seamless integration of compression algorithms into deployment frameworks (e.g., vLLM).
Performance & Quality Evaluation. Conduct rigorous objective and subjective visual quality assessments such as PSNR and VMAF for video systems, as well as perplexity, zero-shot benchmarks, latency, and throughput analysis for LLM systems.

Required Qualifications

Master's or PhD in Computer Science, Electronic Engineering, Mathematics, or related fields (PhD preferred).
Solid understanding of video coding fundamentals including prediction, transform coding, quantization, and entropy coding with hands-on experience in standards such as H.265/HEVC, AV1, or H.266/VVC.
Strong understanding of Transformer architectures and attention mechanisms, as well as key performance bottlenecks in generative AI inference, particularly memory bandwidth constraints (memory wall).
Strong proficiency in Python and C/C++. Hands-on experience building, training, and modifying models using PyTorch, TensorFlow, etc.

Preferred Qualifications

ISP Knowledge. Familiarity with Image Signal Processing flow, such as demosaicing, denoising, and tone mapping.
Image Processing. Experience in computer vision-based image enhancement (e.g., de-blurring, artifact removal, or HDR).
Hardware Optimization. Knowledge of SIMD, CUDA, or other hardware acceleration techniques for video and tensor processing.

Interested candidates, who wish to apply for the advertised position, please click on Apply Now. We regret that only shortlisted candidates will be notified.

EA License No.: 01C4394

By sending us your personal data and curriculum vitae (CV), you are deemed to consent to PERSOL Singapore Pte Ltd and its affiliates to collect, use and disclose your personal data for the purposes set out in the Privacy Policy available at https://www.persolsingapore.com/policies. You acknowledge that you have read, understood, and agree with the Privacy Policy