Job Description:
As an Algorithm Data Engineer, you will be responsible for the following key areas, transforming data into algorithmic productivity:
- Feature Platform & Feature Store Construction:
- Lead or participate in the design, development, and maintenance of enterprise-level feature platforms / feature stores for both traditional models and LLMs. Address challenges such as online-offline feature consistency, real-time performance, and availability. Standardize and automate feature engineering pipelines to improve the efficiency of algorithm teams.
- High-Quality Dataset Construction and Maintenance:
- Design and build high-performance, low-latency offline and real-time datasets for model training, evaluation, and online inference scenarios. This includes pre-training dataset construction, data filtering, data quality evaluation, data augmentation, and automated evaluation pipelines.
- Algorithm Experimentation and Monitoring Pipelines:
- Participate in building and maintaining the core data pipelines for algorithm experiments, providing end-to-end support from data preparation, configuration, and execution monitoring to metric analysis and result interpretation.
- High-Value Label and Knowledge Graph Mining:
- Leverage deep understanding of e-commerce business and algorithms to mine high-value user profiles, item labels, and relationship graphs from massive behavioral data, effectively feeding back into model optimization and business strategy.
Requirements:
- Currently pursuing a Bachelor's degree in Computer Science, Artificial Intelligence, or related fields.
- Familiar with one or more big data technologies such as Spark, Flink, Hadoop, HBase, Kafka, Druid, ClickHouse.
- Excellent logical thinking, communication, project management, and cross-team coordination skills.
- Highly self-motivated, resilient under pressure, and eager to continuously explore and drive business breakthroughs.
Good to have:
- Experience with LLM pre-training data pipelines, Data Lake, Data Flywheel, or vLLM.
- Background in model evaluation (benchmarks) and model training (pre-training) is a strong plus.