Purpose:
In this role, you will be responsible for the end-to-end design and development of autonomous driving frameworks. You will integrate mainstream perception, prediction, and planning technologies into a unified modeling system, leveraging both vision-only and vision-language modeling paradigms, to support autonomous driving tasks across urban and highway scenarios.
You will play a key role in advancing end-to-end and hybrid architectures, including the exploration of Vision-Language Models (VLMs) to enhance scene understanding, reasoning, and decision-making robustness in complex driving environments.
Responsibilities:
- Lead the design and implementation of end-to-end autonomous driving models, including one-stage (sensor-to-control) and two-stage (e.g., perceptionplanning decoupled) architectures. Define model structures, training pipelines, and optimization strategies for stable and explainable planning outputs.
- Drive the development of pure vision-based end-to-end systems, integrating multi-task capabilities such as BEV perception, static and dynamic occupancy inference, trajectory prediction, and planning.
- Explore and apply Vision-Language Models (VLMs) to improve high-level scene understanding, semantic reasoning, and cross-modal representation learning for autonomous driving tasks.
- Optimize and deploy models on embedded platforms, including inference acceleration, post-processing, system-level integration, performance tuning, stability validation, and on-road testing.
- Deliver production-ready solutions for elevated highways and urban driving scenarios, enabling scalable deployment and continuous progression toward higher levels of autonomy.
Qualification/ Requirements:
- Ph.D. degree in Computer Science, Artificial Intelligence, Robotics, or a related field.
- Strong foundation in autonomous driving systems, with hands-on experience in end-to-end deep learningbased modeling.
- Practical experience in planning, control, or decision-making modules using deep learning approaches.
- Experience or strong interest in Vision-Language Models (VLMs), multimodal learning, or cross-modal representation learning, particularly in applications involving visual scene understanding and reasoning.
- Proficiency in C/C++ and Python, with experience in real-time inference deployment and performance optimization.
- Familiarity with BEV-based representations, occupancy prediction, and multi-task learning frameworks.
- Experience with system integration and real-vehicle testing is a strong plus.
- Strong problem-solving skills, adaptability to complex real-world scenarios, and a results-driven mindset.
- Strong mathematical foundation in optimization techniques relevant to computer vision and deep learning.