AI Engineer (Evaluation)

Nanyang Technological University

Singapore

Fresher

Save

Posted 15 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

AI Singapore (AISG) is Singapore's national programme in artificial intelligence, launched by the National Research Foundation (NRF) to anchor deep national capabilities in AI. Hosted at Nanyang Technological University (NTU), AI Singapore brings together Singapore-based research institutions and the vibrant ecosystem of AI start-ups and companies to perform use-inspired research, grow knowledge, create tools, and develop the talent to power Singapore's AI efforts. Since our inception in 2017, we have established a culture of respect, continuous learning, experimentation and curiosity, centred around innovation.

The candidate will join a team of AI scientists, apprentices, data and software engineers. With the team, he or she will be responsible for building evaluations that test the limits of AI models especially in terms of its multilingual, multicultural and multimodal capabilities.

Duties and Responsibilities:

Develop and maintain evaluation frameworks and pipelines to measure the capabilities of Large Language Models (LLMs).
Keep up to date and experiment with the latest research in multilingual, multicultural, and multimodal LLM evaluations such as the LLM-as-a-Judge paradigm.
Work with partners to collect, translate and verify evaluation datasets.
Perform the necessary data preparation and analysis, AI modelling, coding, testing, validation and deployment to ensure reliable and scalable AI solutions.
Collaborate with cross-functional teams within AI Products to design and resolve issues.
Maintain code repository and documentation standards.
Contribute to community engagement activities such as sharings via technical session meet-ups and article write-ups, and participating in discussion forums.

Requirements:

Graduate from a degree program in computer science, AI, data science or related fields (or equivalent practical experience).
Deep understanding of LLM evaluation experimental design and the advantages/limitations of various LLM evaluation methods.
Experience in LLM inference frameworks such as vLLM and AI/Deep Learning frameworks such as PyTorch.
Writing production level code in Python and using version control systems such as Git.
Strong written and verbal communication skills.
Independent learner that is capable of reading and understanding research papers.
Fluent in English and one other Southeast Asian language for the purposes of understanding how to build quality evaluations from a multilingual and multicultural perspective.

We regret that only shortlisted candidates will be notified.

Hiring Institution: NTU