About The Team
The EGO Team is dedicated to building an industry-leading Machine Learning (ML) platform that powers the efficient deployment of algorithms across core business sectors, including Recommendation, Search, and Advertising. Our platform focuses on CTR/CVR prediction within large-scale sparse feature scenarios and explores Generative Recommendation (GR) integrated with Large Language Models (LLMs). By deeply optimizing the entire pipelinefrom model training to online inferencewe deliver low-latency, high-throughput, and high-precision inference services for e-commerce, general content, and social media scenarios, serving as a core algorithmic engine for business growth.
The platform covers the full lifecycle of Deep Learning, including sample generation, feature engineering, model training, deployment, online inference, and closed-loop monitoring. We have developed a robust training/inference acceleration framework, complemented by a Web UI and RESTful APIs, aiming to achieve a truly end-to-end, automated, and intelligent machine learning ecosystem.
Job Description
- Responsible for the R&D and optimization of online inference services for deep learning models in large-scale sparse feature scenarios, supporting high-efficiency inference needs across Shopee's various business lines.
- Conduct in-depth research into various inference acceleration algorithms to reduce the computational cost of model deployment.
- Collaborate across the business pipeline to tune the end-to-end online service system, ensuring high availability and stability.
- Research and implement efficient inference solutions that combine Large Language Models (LLMs) with Search, Ads, and Recommendation (GR).
Requirements
- Bachelor's degree or above in Computer Science, Electronics, Automation, Software Engineering, or related fields, with at least 2 years of relevant work experience.
- Expertise in C++ programming with a solid foundation in low-level systems; proficient in multi-threading, lock optimization, memory pools, thread pools, template programming, GDB debugging, performance profiling, and RPC frameworks.
- Experience in online inference/serving; has developed proprietary inference engines or is highly familiar with engines such as TensorFlow + XLA, TensorRT, Triton, vLLM, or TensorRT-LLM.
- Deep practical experience in GPU optimization, including operator fusion, graph optimization, CUDA programming, kernel scheduling, Warp execution models, memory access optimization, and VRAM scheduling.
- Preferred: Candidates who have researched or implemented GR (Generative Recommendation) solutions such as HSTU, HLLM, or OneRec.
- High passion for computer technology, proactive learning mindset, and a spirit for deep technical dive; maintains high standards for code quality and demonstrates a rigorous, detail-oriented work style.
- Strong team player with excellent continuous learning capabilities.