Fundamental Vision Lab

Publications

We try our best to do research with long-term impact.

Highlighted

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, ..., Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai
CVPR 2024 (Oral)   ·   18 Jan 2024   ·   arxiv:2312.14238
Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory
Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory
Xizhou Zhu, Yuntao Chen, Hao Tian, Chenxin Tao, Weijie Su, ..., Lewei Lu, Xiaogang Wang, Yu Qiao, Zhaoxiang Zhang, Jifeng Dai
Arxiv Tech Report 2023   ·   02 Jun 2023   ·   arxiv:2305.17144
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Wenhai Wang, Zhe Chen, Xiaokang Chen, Jiannan Wu, Xizhou Zhu, ..., Ping Luo, Tong Lu, Jie Zhou, Yu Qiao, Jifeng Dai
NeurIPS 2023   ·   26 May 2023   ·   arxiv:2305.11175
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, ..., Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao
CVPR 2023 (Highlight)   ·   18 Apr 2023   ·   arxiv:2211.05778
Planning-oriented Autonomous Driving
Planning-oriented Autonomous Driving
Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, ..., Xiaosong Jia, Qiang Liu, Jifeng Dai, Yu Qiao, Hongyang Li
CVPR 2023 (Best Paper Award)   ·   24 Mar 2023   ·   arxiv:2212.10156
BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision
BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision
Chenyu Yang, Yuntao Chen, Hao Tian, Chenxin Tao, Xizhou Zhu, ..., Hongyang Li, Yu Qiao, Lewei Lu, Jie Zhou, Jifeng Dai
CVPR 2023 (Highlight)   ·   18 Jun 2023   ·   arxiv:2211.10439
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks
Hao Li, Jinguo Zhu, Xiaohu Jiang, Xizhou Zhu, Hongsheng Li, ..., Xiaohua Wang, Yu Qiao, Xiaogang Wang, Wenhai Wang, Jifeng Dai
CVPR 2023 (Highlight)   ·   18 Jun 2023   ·   arxiv:2211.09808
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs
Jinguo Zhu, Xizhou Zhu, Wenhai Wang, Xiaohua Wang, Hongsheng Li, Xiaogang Wang, Jifeng Dai
NeurIPS 2022 (Spotlight)   ·   06 Jul 2022   ·   arxiv:2206.04674
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
Xizhou Zhu, Jinguo Zhu, Hao Li, Xiaoshi Wu, Xiaogang Wang, Hongsheng Li, Xiaohua Wang, Jifeng Dai
CVPR 2022   ·   19 Jun 2022   ·   arxiv:2112.01522
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai
ICLR 2021 (Oral)   ·   04 May 2021   ·   arxiv:2010.04159
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, Jifeng Dai
ICLR 2020   ·   19 Feb 2020   ·   arxiv:1908.08530
Deformable ConvNets v2: More Deformable, Better Results
Deformable ConvNets v2: More Deformable, Better Results
Xizhou Zhu, Han Hu, Stephen Lin, Jifeng Dai
CVPR 2019   ·   16 Jun 2019   ·   arxiv:1811.11168
Relation Networks for Object Detection
Relation Networks for Object Detection
Han Hu, Jiayuan Gu, Zheng Zhang, Jifeng Dai, Yichen Wei
CVPR 2018 (Oral)   ·   15 Jun 2018   ·   arxiv:1711.11575
Flow-Guided Feature Aggregation for Video Object Detection
Flow-Guided Feature Aggregation for Video Object Detection
Xizhou Zhu, Yujie Wang, Jifeng Dai, Lu Yuan, Yichen Wei
ICCV 2017   ·   21 Aug 2017   ·   arxiv:1703.10025
Deformable Convolutional Networks
Deformable Convolutional Networks
Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Yichen Wei
ICCV 2017 (Oral)   ·   06 Jun 2017   ·   arxiv:1703.06211
Deep Feature Flow for Video Recognition
Deep Feature Flow for Video Recognition
Xizhou Zhu, Yuwen Xiong, Jifeng Dai, Lu Yuan, Yichen Wei
CVPR 2017   ·   06 Jun 2017   ·   arxiv:1611.07715
Fully Convolutional Instance-aware Semantic Segmentation
Fully Convolutional Instance-aware Semantic Segmentation
Yi Li, Haozhi Qi, Jifeng Dai, Xiangyang Ji, Yichen Wei
CVPR 2017 (Spotlight)   ·   11 Apr 2017   ·   arxiv:1611.07709
Convolutional Feature Masking for Joint Object and Stuff Segmentation
Convolutional Feature Masking for Joint Object and Stuff Segmentation
Jifeng Dai, Kaiming He, Jian Sun
CVPR 2015   ·   08 Jun 2015   ·   arxiv:1412.1283
ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation
ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation
Di Lin, Jifeng Dai, Jiaya Jia, Kaiming He, Jian Sun
CVPR 2016 (Oral)   ·   19 Apr 2016   ·   arxiv:1604.05144
Instance-aware Semantic Segmentation via Multi-task Network Cascades
Instance-aware Semantic Segmentation via Multi-task Network Cascades
Jifeng Dai, Kaiming He, Jian Sun
CVPR 2016 (Oral)   ·   26 Jun 2016   ·   arxiv:1512.04412
BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation
BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation
Jifeng Dai, Kaiming He, Jian Sun
ICCV 2015   ·   19 May 2015   ·   arxiv:1503.01640

All

2024

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Weiyun Wang, Yiming Ren, Haowen Luo, Tiantong Li, Chenxiang Yan, ..., Qingyun Li, Lewei Lu, Xizhou Zhu, Yu Qiao, Jifeng Dai
Arxiv Tech Report 2024   ·   26 Aug 2024   ·   arxiv:2402.19474
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
Weiyun Wang, Min Shi, Qingyun Li, Wenhai Wang, Zhenhang Huang, ..., Zhiguo Cao, Yushi Chen, Tong Lu, Jifeng Dai, Yu Qiao
ICLR 2024   ·   07 May 2024   ·   arxiv:2308.01907
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, ..., Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang
Arxiv Tech Report 2024   ·   01 May 2024   ·   arxiv:2404.16821
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
Changyao Tian, Xizhou Zhu, Yuwen Xiong, Weiyun Wang, Zhe Chen, ..., Tong Lu, Jie Zhou, Hongsheng Li, Yu Qiao, Jifeng Dai
Arxiv Tech Report 2024   ·   03 Apr 2024   ·   arxiv:2401.10208
Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
Hao Li, Xue Yang, Zhaokai Wang, Xizhou Zhu, Jie Zhou, Yu Qiao, Xiaogang Wang, Hongsheng Li, Lewei Lu, Jifeng Dai
Arxiv Tech Report 2023   ·   02 Apr 2024   ·   arxiv:2312.09238
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Yuchen Duan, Weiyun Wang, Zhe Chen, Xizhou Zhu, Lewei Lu, Tong Lu, Yu Qiao, Hongsheng Li, Jifeng Dai, Wenhai Wang
Arxiv Tech Report 2024   ·   08 Mar 2024   ·   arxiv:2403.02308
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, ..., Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai
CVPR 2024 (Oral)   ·   18 Jan 2024   ·   arxiv:2312.14238
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
Yuwen Xiong, Zhiqi Li, Yuntao Chen, Feng Wang, Xizhou Zhu, ..., Hongsheng Li, Yu Qiao, Lewei Lu, Jie Zhou, Jifeng Dai
CVPR 2024 (Highlight)   ·   15 Jan 2024   ·   arxiv:2401.06197

2023

DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
Wenhai Wang, Jiangwei Xie, ChuanYang Hu, Haoming Zou, Jianan Fan, ..., Lewei Lu, Xizhou Zhu, Xiaogang Wang, Yu Qiao, Jifeng Dai
Arxiv Tech Report 2023   ·   27 Dec 2023   ·   arxiv:2312.09245
ControlLLM: Augment Language Models with Tools by Searching on Graphs
Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao, Erfei Cui, Ziheng Li, ..., Lewei Lu, Qifeng Chen, Yu Qiao, Jifeng Dai, Wenhai Wang
Arxiv Tech Report 2023   ·   19 Dec 2023   ·   arxiv:2310.17796
Demystify Transformers & Convolutions in Modern Image Deep Networks
Xiaowei Hu, Min Shi, Weiyun Wang, Sitong Wu, Linjie Xing, ..., Lewei Lu, Jie Zhou, Xiaogang Wang, Yu Qiao, Jifeng Dai
Arxiv Tech Report 2023   ·   04 Dec 2023   ·   arxiv:2211.05781
Siamese Image Modeling for Self-Supervised Vision Representation Learning
Chenxin Tao, Xizhou Zhu, Weijie Su, Gao Huang, Bin Li, Jie Zhou, Yu Qiao, Xiaogang Wang, Jifeng Dai
CVPR 2023 (Highlight)   ·   18 Jun 2023   ·   arxiv:2206.01204
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
Weijie Su, Xizhou Zhu, Chenxin Tao, Lewei Lu, Bin Li, Gao Huang, Yu Qiao, Xiaogang Wang, Jie Zhou, Jifeng Dai
CVPR 2023 (Highlight)   ·   18 Jun 2023   ·   arxiv:2211.09807
BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision
Chenyu Yang, Yuntao Chen, Hao Tian, Chenxin Tao, Xizhou Zhu, ..., Hongyang Li, Yu Qiao, Lewei Lu, Jie Zhou, Jifeng Dai
CVPR 2023 (Highlight)   ·   18 Jun 2023   ·   arxiv:2211.10439
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks
Hao Li, Jinguo Zhu, Xiaohu Jiang, Xizhou Zhu, Hongsheng Li, ..., Xiaohua Wang, Yu Qiao, Xiaogang Wang, Wenhai Wang, Jifeng Dai
CVPR 2023 (Highlight)   ·   18 Jun 2023   ·   arxiv:2211.09808
Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory
Xizhou Zhu, Yuntao Chen, Hao Tian, Chenxin Tao, Weijie Su, ..., Lewei Lu, Xiaogang Wang, Yu Qiao, Zhaoxiang Zhang, Jifeng Dai
Arxiv Tech Report 2023   ·   02 Jun 2023   ·   arxiv:2305.17144
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Wenhai Wang, Zhe Chen, Xiaokang Chen, Jiannan Wu, Xizhou Zhu, ..., Ping Luo, Tong Lu, Jie Zhou, Yu Qiao, Jifeng Dai
NeurIPS 2023   ·   26 May 2023   ·   arxiv:2305.11175
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, ..., Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao
CVPR 2023 (Highlight)   ·   18 Apr 2023   ·   arxiv:2211.05778
Planning-oriented Autonomous Driving
Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, ..., Xiaosong Jia, Qiang Liu, Jifeng Dai, Yu Qiao, Hongyang Li
CVPR 2023 (Best Paper Award)   ·   24 Mar 2023   ·   arxiv:2212.10156

2022

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition
Changyao Tian, Wenhai Wang, Xizhou Zhu, Jifeng Dai, Yu Qiao
ECCV 2022   ·   20 Jul 2022   ·   arxiv:2111.13579
BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers
Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, Jifeng Dai
ECCV 2022   ·   14 Jul 2022   ·   arxiv:2203.17270
Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework
Chenxin Tao, Honghui Wang, Xizhou Zhu, Jiahua Dong, Shiji Song, Gao Huang, Jifeng Dai
CVPR 2022   ·   06 Jul 2022   ·   arxiv:2112.05141
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs
Jinguo Zhu, Xizhou Zhu, Wenhai Wang, Xiaohua Wang, Hongsheng Li, Xiaogang Wang, Jifeng Dai
NeurIPS 2022 (Spotlight)   ·   06 Jul 2022   ·   arxiv:2206.04674
AutoLoss-Zero: Searching Loss Functions from Scratch for Generic Tasks
Hao Li, Tianwen Fu, Jifeng Dai, Hongsheng Li, Gao Huang, Xizhou Zhu
CVPR 2022   ·   19 Jun 2022   ·   arxiv:2103.14026
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
Xizhou Zhu, Jinguo Zhu, Hao Li, Xiaoshi Wu, Xiaogang Wang, Hongsheng Li, Xiaohua Wang, Jifeng Dai
CVPR 2022   ·   19 Jun 2022   ·   arxiv:2112.01522

2021

Searching Parameterized AP Loss for Object Detection
Chenxin Tao, Zizhang Li, Xizhou Zhu, Gao Huang, Yong Liu, Jifeng Dai
NeurIPS 2021   ·   09 Nov 2021   ·   https://openreview.net/forum?id=hLTZCN7f3M-
Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation
Hao Li, Chenxin Tao, Xizhou Zhu, Xiaogang Wang, Gao Huang, Jifeng Dai
ICLR 2021   ·   04 May 2021   ·   arxiv:2010.07930
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai
ICLR 2021 (Oral)   ·   04 May 2021   ·   arxiv:2010.04159
Unsupervised Object Detection with LiDAR Clues
Hao Tian, Yuntao Chen, Jifeng Dai, Zhaoxiang Zhang, Xizhou Zhu
CVPR 2021   ·   20 Apr 2021   ·   arxiv:2011.12953

2020

VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, Jifeng Dai
ICLR 2020   ·   19 Feb 2020   ·   arxiv:1908.08530
Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation
Hang Gao, Xizhou Zhu, Steve Lin, Jifeng Dai
ICLR 2020   ·   13 Feb 2020   ·   arxiv:1910.02940

2019

MMDetection: Open MMLab Detection Toolbox and Benchmark
Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, ..., Jingdong Wang, Jianping Shi, Wanli Ouyang, Chen Change Loy, Dahua Lin
CVPR 2019   ·   18 Jun 2019   ·   arxiv:1906.07155
Deformable ConvNets v2: More Deformable, Better Results
Xizhou Zhu, Han Hu, Stephen Lin, Jifeng Dai
CVPR 2019   ·   16 Jun 2019   ·   arxiv:1811.11168
An Empirical Study of Spatial Attention Mechanisms in Deep Networks
Xizhou Zhu, Dazhi Cheng, Zheng Zhang, Stephen Lin, Jifeng Dai
ICCV 2019   ·   15 Apr 2019   ·   arxiv:1904.05873

2018

Integrated Object Detection and Tracking with Tracklet-Conditioned Detection
Zheng Zhang, Dazhi Cheng, Xizhou Zhu, Stephen Lin, Jifeng Dai
Arxiv Tech Report 2018   ·   28 Nov 2018   ·   arxiv:1811.11167
Relation Networks for Object Detection
Han Hu, Jiayuan Gu, Zheng Zhang, Jifeng Dai, Yichen Wei
CVPR 2018 (Oral)   ·   15 Jun 2018   ·   arxiv:1711.11575
Towards High Performance Video Object Detection for Mobiles
Xizhou Zhu, Jifeng Dai, Xingchi Zhu, Yichen Wei, Lu Yuan
Arxiv Tech Report 2018   ·   17 Apr 2018   ·   arxiv:1804.05830
Learning Region Features for Object Detection
Jiayuan Gu, Han Hu, Liwei Wang, Yichen Wei, Jifeng Dai
ECCV 2018   ·   20 Mar 2018   ·   arxiv:1803.07066

2017

Flow-Guided Feature Aggregation for Video Object Detection
Xizhou Zhu, Yujie Wang, Jifeng Dai, Lu Yuan, Yichen Wei
ICCV 2017   ·   21 Aug 2017   ·   arxiv:1703.10025
Deep Feature Flow for Video Recognition
Xizhou Zhu, Yuwen Xiong, Jifeng Dai, Lu Yuan, Yichen Wei
CVPR 2017   ·   06 Jun 2017   ·   arxiv:1611.07715
Deformable Convolutional Networks
Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Yichen Wei
ICCV 2017 (Oral)   ·   06 Jun 2017   ·   arxiv:1703.06211
Fully Convolutional Instance-aware Semantic Segmentation
Yi Li, Haozhi Qi, Jifeng Dai, Xiangyang Ji, Yichen Wei
CVPR 2017 (Spotlight)   ·   11 Apr 2017   ·   arxiv:1611.07709

2016

R-FCN: Object Detection via Region-based Fully Convolutional Networks
Jifeng Dai, Yi Li, Kaiming He, Jian Sun
NeurIPS 2016   ·   05 Dec 2016   ·   arxiv:1605.06409
Instance-aware Semantic Segmentation via Multi-task Network Cascades
Jifeng Dai, Kaiming He, Jian Sun
CVPR 2016 (Oral)   ·   26 Jun 2016   ·   arxiv:1512.04412
ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation
Di Lin, Jifeng Dai, Jiaya Jia, Kaiming He, Jian Sun
CVPR 2016 (Oral)   ·   19 Apr 2016   ·   arxiv:1604.05144
Instance-sensitive Fully Convolutional Networks
Jifeng Dai, Kaiming He, Yi Li, Shaoqing Ren, Jian Sun
ECCV 2016   ·   30 Mar 2016   ·   arxiv:1603.08678

2015

Convolutional Feature Masking for Joint Object and Stuff Segmentation
Jifeng Dai, Kaiming He, Jian Sun
CVPR 2015   ·   08 Jun 2015   ·   arxiv:1412.1283
BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation
Jifeng Dai, Kaiming He, Jian Sun
ICCV 2015   ·   19 May 2015   ·   arxiv:1503.01640
Generative Modeling of Convolutional Neural Networks
Jifeng Dai, Yang Lu, Ying-Nian Wu
ICLR 2015   ·   10 Apr 2015   ·   arxiv:1412.6296