Senior Backend Engineer - Machine Learning Platform (R&D, CTR/VTR Predictor) - Ego team

Shopee

Singapore

2-4 Years

Save

Posted 15 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

About The Team

The EGO Team is dedicated to building an industry-leading Machine Learning (ML) platform that powers the efficient deployment of algorithms across core business sectors, including Recommendation, Search, and Advertising. Our platform focuses on CTR/CVR prediction within large-scale sparse feature scenarios and explores Generative Recommendation (GR) integrated with Large Language Models (LLMs). By deeply optimizing the entire pipelinefrom model training to online inferencewe deliver low-latency, high-throughput, and high-precision inference services for e-commerce, general content, and social media scenarios, serving as a core algorithmic engine for business growth.

The platform covers the full lifecycle of Deep Learning, including sample generation, feature engineering, model training, deployment, online inference, and closed-loop monitoring. We have developed a robust training/inference acceleration framework, complemented by a Web UI and RESTful APIs, aiming to achieve a truly end-to-end, automated, and intelligent machine learning ecosystem.

Job Description

Responsible for the R&D and optimization of online inference services for deep learning models in large-scale sparse feature scenarios, supporting high-efficiency inference needs across Shopee's various business lines.
Conduct in-depth research into various inference acceleration algorithms to reduce the computational cost of model deployment.
Collaborate across the business pipeline to tune the end-to-end online service system, ensuring high availability and stability.
Research and implement efficient inference solutions that combine Large Language Models (LLMs) with Search, Ads, and Recommendation (GR).

Requirements

Bachelor's degree or above in Computer Science, Electronics, Automation, Software Engineering, or related fields, with at least 2 years of relevant work experience.
Expertise in C++ programming with a solid foundation in low-level systems; proficient in multi-threading, lock optimization, memory pools, thread pools, template programming, GDB debugging, performance profiling, and RPC frameworks.
Experience in online inference/serving; has developed proprietary inference engines or is highly familiar with engines such as TensorFlow + XLA, TensorRT, Triton, vLLM, or TensorRT-LLM.
Deep practical experience in GPU optimization, including operator fusion, graph optimization, CUDA programming, kernel scheduling, Warp execution models, memory access optimization, and VRAM scheduling.
Preferred: Candidates who have researched or implemented GR (Generative Recommendation) solutions such as HSTU, HLLM, or OneRec.
High passion for computer technology, proactive learning mindset, and a spirit for deep technical dive; maintains high standards for code quality and demonstrates a rigorous, detail-oriented work style.
Strong team player with excellent continuous learning capabilities.