Search by job, company or skills

H

Machine Learning Systems Engineer (MLSys)

Fresher
SGD 6,000 - 9,000 per month
Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 19 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Responsibilities

  • System Development & Maintenance
    Contribute to the development, optimization, and maintenance of core components of the machine learning platform, including feature stores, experiment tracking systems, model registries, workflow orchestration, and serving frameworks
  • Training Efficiency Optimization
    Assist in optimizing the performance of distributed training frameworks (e.g., PyTorch DDP, DeepSpeed, FSDP) on large-scale clusters, addressing challenges such as resource scheduling and communication bottlenecks
  • Inference Performance Optimization
    Participate in model deployment and serving, including performance profiling and acceleration through model compilation (e.g., TVM, TensorRT), operator optimization, computation graph optimization, and batching strategies
  • Infrastructure Support
    Leverage technologies such as containerization (Docker), orchestration (Kubernetes), and monitoring (Prometheus/Grafana) to improve observability, reliability, and resource utilization of ML systems
  • Tooling & Developer Productivity
    Build and maintain internal tools to improve engineering efficiency, such as automated evaluation systems, stress testing tools, and debugging utilities

Qualifications

Education

  • Bachelor's degree or above in Computer Science, Software Engineering, Electronic Engineering, or related fields

Fundamental Knowledge

  • Solid foundation in computer science fundamentals: operating systems, computer networks, data structures, and algorithms
  • Strong programming skills, with proficiency in Python experience with Go or C++ is a strong plus
  • Basic understanding of software engineering principles, including design patterns and clean coding practices

Technical Skills

  • Familiarity with Linux development environments, including common commands and shell scripting
  • Experience with at least one mainstream deep learning framework (preferably PyTorch), with curiosity about its underlying mechanisms
  • Basic hands-on experience with containerization (Docker), CI/CD pipelines, and version control (Git)

Soft Skills

  • Strong passion for engineering and building high-performance, highly available systems
  • Excellent problem-solving and debugging skills, with a mindset for optimization
  • Good communication and teamwork skills, able to collaborate effectively across cross-functional teams
  • Strong curiosity and willingness to deeply understand machine learning algorithms and their integration with system engineering

Preferred Qualifications (Nice to Have)

  • Familiarity with Kubernetes and cloud-native technologies
  • Experience with model serving frameworks such as Triton, TensorFlow Serving, or TorchServe
  • Understanding of compiler fundamentals (e.g., LLVM), high-performance computing (HPC), or hardware acceleration (GPU/ASIC)
  • Contributions to open-source projects or relevant system/infrastructure projects on GitHub
  • Experience with large-scale data processing (e.g., Spark, Flink) or storage systems

More Info

Job Type:
Industry:
Employment Type:

Job ID: 147011723