LLM Post-Training Researcher

Early Applicant

2-5 Years

SGD 18,000 - 32,000 per month

Job Description

Implement state-of-the-art RLHF (Reinforcement Learning with Human Feedback) or RLAIF (Reinforcement Learning with AI Feedback) algorithms, such as DPO and PPO, to enhance game and role-play characters.
Conduct data analysis and data cleaning to improve post-training data quality.
Research and apply advanced reasoning techniques, such as chain-of-thought reasoning, to enhance AI agent capabilities.
Develop reward model for RLHF.

Master's or PhD in Computer Science, AI, Machine Learning, Linguistics, Statistics, or a related technical field.
Proven experience in NLP, LLM research, or machine learning projects.
Strong creativity and problem-solving skills.
Proficiency in Python and deep learning frameworks such as PyTorch, TensorFlow, or Hugging Face.
Excellent programming skills, including familiarity with data structures and algorithms. Competitions such as ACM/ICPC, USACO/NOI/IOI, Top Coder, or Kaggle are a plus.
Effective communication and collaboration skills, with a passion for exploring new technologies and driving technological innovation.