Conduct research and algorithm/model development in the multimodal large model domain;
Develop image and video analysis algorithms for surveillance scenarios, including detection of venues, equipment, and other targets, as well as feature extraction, tracking, and recognition to advance algorithmic research;
Lead R&D, optimization, and testing of speech/audio noise reduction algorithms.
Qualifications:
Masters degree or higher with 8+ years of relevant experience in Computer Science, Applied Mathematics, Statistics, Pattern Recognition, Artificial Intelligence, Automatic Control, Operations Research, Biology, Physics/Quantum Computing, Neuroscience, or related fields;
Proficiency in mainstream machine learning and deep learning-based image processing algorithms; mastery of latest SOTA implementations; hands-on experience with one or more deep learning frameworks (e.g., Caffe, PyTorch, TensorFlow, MxNet); ability to independently design, develop, and optimize algorithms;
Familiarity with Python, C/C++, and exceptional coding skills;
Outstanding scientific research capabilities, logical reasoning, and data sensitivity.
Preferred Skills: Multimodal modeling, visual large models, natural language large models, VLM, LLM.