About The Team
The EGO team is dedicated to building an industry-leading machine learning platform to effectively support the implementation of algorithms across various business domains such as recommendation, search, and advertising. It focuses on extreme optimization for CTR/CVR prediction in large-scale sparse parameter scenarios, ensuring maximum performance in e-commerce applications and delivering greater value to the company.
The EGO platform covers the entire deep machine learning workflow from sample organization and training to model building and publishing, and further to online model loading and inference services. It comes with a user-friendly Web UI and Restful API, providing an end-to-end, one-stop machine learning platform.
Job Description
- Develop distributed Parameter Server (PS) systems for large-scale sparse model training and inference platforms in the search, advertising, and recommendation domains. The system should support high-throughput parameter read/write and update operations, handle hundreds of billions of features and TB-level sparse models, enable online real-time learning, and meet algorithmic needs such as feature admission and expiration.
- Participate in the development of the one-stop machine learning platform, integrating the PS system into the platform to provide a user-friendly, stable, high-performance, and platform-level distributed parameter service system. Enhance the platforms efficiency and usability, accelerating the model iteration process for algorithm teams.
Requirements
- Bachelors degree or above in Computer Science, Electronics, Automation, Software Engineering, or related fields
- At least 3 years of relevant hands-on experience
- Proficient in C++ programming with strong low-level technical skills; adept at multi-threaded programming, lock optimisation, memory pool, thread pool, template programming, GDB debugging, performance tuning, and RPC frameworks.
- Familiarity with distributed PS systems, distributed system backend optimization, high-performance in-memory KV systems, KV storage systems based on NVMe-SSD, and high-performance client-server architecture systems is a plus.
- Highly passionate about computer technology, proactive in learning, with a strong spirit of in-depth research and hands-on practice. Maintains high standards and strict requirements for delivered code; works with rigor and attention to detail.
- Strong team player with excellent continuous learning ability.