About The Role
We are building a multilingual Large Language Model tailored for Bahasa Indonesia and regional languages. We are looking for a passionate Senior Data Scientist to help shape the future of open and inclusive AI for Indonesia, as well as playing a pivotal role in identifying impactful AI use cases. As a Senior Data Scientist working on LLMs, you will design and build high-quality datasets, advanced model pre-training, fine tuning and alignment techniques, and collaborate closely with product and engineering teams to ship safe, reliable LLM-powered features to millions of users. This role offers the opportunity to drive innovation, solve critical business challenges, and shape the future of AI-driven solutions at GoTo Group.
What You Will Do
- Perform data annotation and labeling based on provided guidelines
- Validate language accuracy, grammar, and contextual relevance
- Review annotated datasets to identify and correct errors
- Ensure consistency and quality across large volumes of data
- Collaborate with internal teams to refine annotation processes
- Provide feedback to improve annotation guidelines and workflows
What You Will Need
- 4+ years of experience in LLM, Deep Learning, NLP, Computer Vision, or Voice.
- Proficient in data preprocessing, model training, evaluation, and optimisation.
- Practical experience in applying deep learning to solve real business problems, with models successfully deployed and used in production environments.
- Proficient with Python and deep learning frameworks such as PyTorch or Tensorflow.
- Experience with cloud platforms like Alicloud or Tencent.
- Strong communication skills to understand business needs and effectively convey analytical solutions.
- Ability to write clear and concise technical documentation.
- A Master's or PhD in Computer Science, Data Science, AI, or a related field.
- Understanding Bahasa Indonesia will be an advantage.
About The Team
The LLM team is on a mission to build the most capable and culturally-aligned multilingual LLMs for Indonesia. At GoTo Group, the team is at the forefront of developing state-of-the-art language models. We are building foundational AI models that understand and generate Bahasa Indonesia and regional languages – empowering more inclusive technology. We work on everything from continual pretraining large-scale LLMs to alignment and safety fine-tuning, using both structured and unstructured data from diverse sources. Our projects span core model development, dataset curation, safety systems, and real-world deployment in consumer and enterprise applications. Our team brings together members with diverse technical and cultural backgrounds, bringing expertise in machine learning and local languages.