AI Infrastructure Intern

Tencent

Singapore

Fresher

Save

Posted 17 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

What the Role Entails

- Design and implement high-performance orchestration systems to automate the deployment, scaling, and lifecycle management of large-scale model training and real-time inference services

- Develop scheduling algorithms to optimize the utilization of global GPU/CPU clusters. Focus on improving resource allocation efficiency and managing the complexities of heterogeneous hardware

- Participate in development of training-related requirements (incremental training and API for online learning) and enhance framework efficiency

- Participate in development of pipelines for offline-to-online synchronization. Ensure strict data consistency and minimize model-update latency to ensure the most current intelligence is served

- Collaborate with cross-functional teams to ensure platform features are available, secure, and compliant for a global user base.

Who We Look For

- Currently enrolled in an undergraduate or graduate degree program in Computer Science, Information Systems, or related fields

- Experience in machine learning system practice and open-source ML orchestration frameworks (e.g. Ray/TFX/Kubeflow)

- Proficiency in backend software design, development, and deployment practices with at least one of the following programming languages: Golang, Python, C++, or Java

- In-depth knowledge of distributed system principles (e.g., consistency protocols like Paxos or Raft, distributed locking, and caching strategies)

- Familiarity with Linux operating system and common system tools

- Strong ownership, customer-oriented values, and integrity demonstrated

- Good programming discipline, fast-learning ability, and teamwork skills

- Prior internet industry work or internship experience is a plus

- Fluency in both English and Mandarin Chinese for effective communication with international stakeholders