Roles & Responsibilities
- Responsible for the core technology development in the Post-Training phase of large language models, building and optimizing a high-quality Reward System. Continuously enhance the model's capabilities in complex instruction adherence, logical reasoning, and value alignment through Reward Modeling (RM) and Reinforcement Learning (RL) algorithms.
- Conduct in-depth research and optimization of post-training algorithms such as RLHF to improve model training stability and final outcomes.
- Manage and synthesize data in the post-training phase, design an efficient data feedback loop mechanism, utilize techniques like SFT and Self-Instruct to generate high-quality training data, and establish a closed-loop signal modeling system from User Feedback to model iteration.
- Perform comprehensive evaluation and analysis of post-training models, develop scientific evaluation metrics, and keep up with cutting-edge technology trends, quickly translating the latest research results into business value.
Knowledge & Competencies:
- Master's degree or higher in Computer Science, Software Engineering, Artificial Intelligence, or related fields.
- Deep understanding of the Transformer architecture and the principles of large language model training, with substantial research and practical experience in one of the post-training areas such as LLM Alignment, RLHF, or Reward Modeling.
- Solid foundation in algorithms and engineering implementation capabilities, proficient in Python, and familiar with deep learning frameworks such as PyTorch or TensorFlow.
- Practical experience in distributed training, familiar with large-scale training and inference frameworks like Megatron-LM, DeepSpeed, and vLLM. Experience in training or tuning models with billions or hundreds of billions of parameters is preferred.
- Excellent research skills, with a record of high-quality publications (NeurIPS, ICLR, ICML, ACL, EMNLP, etc.) or contributions to high-impact projects in the open-source community (e.g., HuggingFace) preferred.
Strong technical enthusiasm and self-motivation, adept at analyzing and solving complex problems, with good teamwork and communication skills.