
Search by job, company or skills
We are seeking an AI Infrastructure Engineer to build and scale the foundation for large-scale AI model training at a leading AI platform provider. In this role, you will develop and standardize high-performance training workflows, enabling customers to efficiently train advanced AI models in production environments. You will work closely with internal engineering teams and external customers to optimize training performance, establish benchmarks, and deliver best-in-class practices for distributed AI training at scale.
Responsibilities:
As an AI Infrastructure Engineer, your primary responsibilities will include designing and implementing scalable training frameworks for large AI models. You will develop reusable training recipes, benchmarks, and performance baselines to guide customers in achieving optimal results. You will collaborate with engineering, product, and customer-facing teams to troubleshoot and optimize training workloads across distributed systems. Additionally, you will document best practices, support customer deployments, and contribute to continuous improvements in training efficiency, cost optimization, and system performance.
Requirements:
To be successful in this role, you should have at least 5 years of experience in machine learning engineering, AI infrastructure, or distributed systems. Strong hands-on experience with deep learning frameworks such as PyTorch or TensorFlow, as well as familiarity with distributed training techniques, is essential. Experience working with large-scale training environments, GPUs, and performance optimization is highly desirable. A solid understanding of MLOps, model training pipelines, and benchmarking methodologies will be advantageous. Strong problem-solving skills, along with the ability to work cross-functionally and engage with both technical and non-technical stakeholders, are critical. A Bachelor's or Master's degree in Computer Science, Engineering, or a related field is required
To Apply:
Interested candidates, please send your CV to [Confidential Information]. Due to the high volume of applications, only short-listed candidates are notified.
Registration No: R1983436
License No: 16S8060
Job ID: 146409687