Foundation Model for Visual Perception
Object detection and recognition are the basic problems of visual perception. We focus on designing vision backbones and object detection models to solve these problems and extending them to generic and large-scale foundation models for general perception tasks.
Representative Work:
Object Detection Models with High Precision and Efficiency
-
R-FCN: Object Detection via Region-based Fully Convolutional Networks
[NIPS 2016 3rd most influential paper] [Pytorch standard operator]
-
Deformable DETR: Deformable Transformers for End-to-End Object Detection
[ICLR 2021 2nd most influential paper]
Vision Backbones with Deformable Convolutions & Large-scale Vision Backbones
- [ICCV 2017 6th most influential paper] [Pytorch standard operator]
-
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
[CVPR 2023 highlight paper]