Our client is an innovative technology company developing next-generation AI products across video intelligence, multimodal AI, recommendation systems, and generative AI. As part of their continued growth, they are seeking a Senior AI Data Engineer to build and scale the data infrastructure that powers large-scale AI model development.
This role will focus on designing and optimizing high-quality multimodal datasets, video data pipelines, and AI training data systems that support cutting-edge applications in video understanding, content generation, recommendation, moderation, and foundation models.
Job Responsibilities
- Design, develop, and maintain scalable data pipelines for processing large-scale video and multimodal datasets.
- Build automated workflows for video ingestion, preprocessing, annotation, quality validation, and metadata enrichment.
- Partner closely with AI researchers and machine learning teams to support model training, evaluation, and deployment across recommendation, computer vision, and generative AI applications.
- Optimize storage, indexing, retrieval, and management systems for large-scale video data assets.
- Develop tools and frameworks that support data annotation, active learning, and human-in-the-loop data operations.
- Drive continuous improvements in dataset quality through monitoring, deduplication, filtering, validation, and bias mitigation strategies.
- Establish and maintain data standards, governance frameworks, and best practices to ensure scalability and consistency.
- Support both batch and real-time data processing requirements for AI-driven products and services.
- Contribute to technical architecture discussions and mentor junior engineers within the team.
Job Requirements
- Bachelor's or Master's degree in Computer Science, Data Engineering, Artificial Intelligence, or a related field.
- 3–5+ years of experience in data engineering, big data platforms, AI data infrastructure, or related disciplines.
- Hands-on experience with distributed data processing frameworks such as Spark, Flink, Ray, or similar technologies.
- Experience working with cloud platforms such as AWS, Google Cloud Platform (GCP), or Microsoft Azure.
- Familiarity with video processing technologies and multimedia systems, including tools such as FFmpeg, OpenCV, streaming platforms, or related frameworks.
- Proven experience building and managing datasets and data pipelines for machine learning or AI applications, particularly multimodal or video-related use cases.
- Strong understanding of data warehousing, orchestration, and workflow management technologies such as Airflow, Kafka, Hive, Snowflake, or equivalent tools.
- Knowledge of machine learning workflows and the data requirements needed to support model development and evaluation.
- Strong analytical, problem-solving, and communication skills, with the ability to thrive in a fast-paced environment.
www.dadaconsultants.com
Licence Number: 18S9037EA
Registration Number: R23112003
Business Registration Number: 201735941W