Fundamental Vision Lab

Generic Model for Multi-modal Perception

We are dedicated to advancing the field of Multi-modal Generalist Model through innovative approaches. We aim to investigate unified task representations, network architecture, and training methods for visual and graph-text multi-modal tasks. We also seek to construct a generalist model for multi-modal tasks that encompasses various applications. In addition, we strive to design a novel universal perceptual paradigm based on large-scale models to achieve comprehensive capabilities geared toward open-world scenarios and open-ended tasks.

Representative Work:

Unified Pre-training Algorithm for Large-scale Vision-Language Models

Unified Modeling and Architecture for General Multi-modal Perception Tasks

Large Vision-Language Model for Open-Ended Vision-Centric Tasks