Publications
We try our best to do research with long-term impact.
Highlighted
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
CVPR 2024 (Oral)
·
18 Jan 2024
·
arxiv:2312.14238
Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory
Arxiv Tech Report 2023
·
02 Jun 2023
·
arxiv:2305.17144
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
NeurIPS 2023
·
26 May 2023
·
arxiv:2305.11175
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
CVPR 2023 (Highlight)
·
18 Apr 2023
·
arxiv:2211.05778
BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision
CVPR 2023 (Highlight)
·
18 Jun 2023
·
arxiv:2211.10439
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks
CVPR 2023 (Highlight)
·
18 Jun 2023
·
arxiv:2211.09808
BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers
ECCV 2022
·
14 Jul 2022
·
arxiv:2203.17270
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs
NeurIPS 2022 (Spotlight)
·
06 Jul 2022
·
arxiv:2206.04674
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
CVPR 2022
·
19 Jun 2022
·
arxiv:2112.01522
Deformable DETR: Deformable Transformers for End-to-End Object Detection
ICLR 2021 (Oral)
·
04 May 2021
·
arxiv:2010.04159
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
ICLR 2020
·
19 Feb 2020
·
arxiv:1908.08530
Flow-Guided Feature Aggregation for Video Object Detection
ICCV 2017
·
21 Aug 2017
·
arxiv:1703.10025
Fully Convolutional Instance-aware Semantic Segmentation
CVPR 2017 (Spotlight)
·
11 Apr 2017
·
arxiv:1611.07709
Convolutional Feature Masking for Joint Object and Stuff Segmentation
CVPR 2015
·
08 Jun 2015
·
arxiv:1412.1283
R-FCN: Object Detection via Region-based Fully Convolutional Networks
NeurIPS 2016
·
05 Dec 2016
·
arxiv:1605.06409
ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation
CVPR 2016 (Oral)
·
19 Apr 2016
·
arxiv:1604.05144
Instance-aware Semantic Segmentation via Multi-task Network Cascades
CVPR 2016 (Oral)
·
26 Jun 2016
·
arxiv:1512.04412
BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation
ICCV 2015
·
19 May 2015
·
arxiv:1503.01640
All
2024
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Arxiv Tech Report 2024
·
26 Aug 2024
·
arxiv:2402.19474
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
ICLR 2024
·
07 May 2024
·
arxiv:2308.01907
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Arxiv Tech Report 2024
·
01 May 2024
·
arxiv:2404.16821
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
Arxiv Tech Report 2024
·
03 Apr 2024
·
arxiv:2401.10208
Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
Arxiv Tech Report 2023
·
02 Apr 2024
·
arxiv:2312.09238
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Arxiv Tech Report 2024
·
08 Mar 2024
·
arxiv:2403.02308
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
CVPR 2024 (Oral)
·
18 Jan 2024
·
arxiv:2312.14238
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
CVPR 2024 (Highlight)
·
15 Jan 2024
·
arxiv:2401.06197
2023
DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
Arxiv Tech Report 2023
·
27 Dec 2023
·
arxiv:2312.09245
ControlLLM: Augment Language Models with Tools by Searching on Graphs
Arxiv Tech Report 2023
·
19 Dec 2023
·
arxiv:2310.17796
Demystify Transformers & Convolutions in Modern Image Deep Networks
Arxiv Tech Report 2023
·
04 Dec 2023
·
arxiv:2211.05781
Siamese Image Modeling for Self-Supervised Vision Representation Learning
CVPR 2023 (Highlight)
·
18 Jun 2023
·
arxiv:2206.01204
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
CVPR 2023 (Highlight)
·
18 Jun 2023
·
arxiv:2211.09807
BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision
CVPR 2023 (Highlight)
·
18 Jun 2023
·
arxiv:2211.10439
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks
CVPR 2023 (Highlight)
·
18 Jun 2023
·
arxiv:2211.09808
Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory
Arxiv Tech Report 2023
·
02 Jun 2023
·
arxiv:2305.17144
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
NeurIPS 2023
·
26 May 2023
·
arxiv:2305.11175
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
CVPR 2023 (Highlight)
·
18 Apr 2023
·
arxiv:2211.05778
Planning-oriented Autonomous Driving
CVPR 2023 (Best Paper Award)
·
24 Mar 2023
·
arxiv:2212.10156
2022
VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition
ECCV 2022
·
20 Jul 2022
·
arxiv:2111.13579
BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers
ECCV 2022
·
14 Jul 2022
·
arxiv:2203.17270
Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework
CVPR 2022
·
06 Jul 2022
·
arxiv:2112.05141
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs
NeurIPS 2022 (Spotlight)
·
06 Jul 2022
·
arxiv:2206.04674
AutoLoss-Zero: Searching Loss Functions from Scratch for Generic Tasks
CVPR 2022
·
19 Jun 2022
·
arxiv:2103.14026
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
CVPR 2022
·
19 Jun 2022
·
arxiv:2112.01522
2021
Searching Parameterized AP Loss for Object Detection
NeurIPS 2021
·
09 Nov 2021
·
https://openreview.net/forum?id=hLTZCN7f3M-
Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation
ICLR 2021
·
04 May 2021
·
arxiv:2010.07930
Deformable DETR: Deformable Transformers for End-to-End Object Detection
ICLR 2021 (Oral)
·
04 May 2021
·
arxiv:2010.04159
Unsupervised Object Detection with LiDAR Clues
CVPR 2021
·
20 Apr 2021
·
arxiv:2011.12953
2020
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
ICLR 2020
·
19 Feb 2020
·
arxiv:1908.08530
Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation
ICLR 2020
·
13 Feb 2020
·
arxiv:1910.02940
2019
MMDetection: Open MMLab Detection Toolbox and Benchmark
CVPR 2019
·
18 Jun 2019
·
arxiv:1906.07155
Deformable ConvNets v2: More Deformable, Better Results
CVPR 2019
·
16 Jun 2019
·
arxiv:1811.11168
An Empirical Study of Spatial Attention Mechanisms in Deep Networks
ICCV 2019
·
15 Apr 2019
·
arxiv:1904.05873
2018
Integrated Object Detection and Tracking with Tracklet-Conditioned Detection
Arxiv Tech Report 2018
·
28 Nov 2018
·
arxiv:1811.11167
Relation Networks for Object Detection
CVPR 2018 (Oral)
·
15 Jun 2018
·
arxiv:1711.11575
Towards High Performance Video Object Detection for Mobiles
Arxiv Tech Report 2018
·
17 Apr 2018
·
arxiv:1804.05830
Learning Region Features for Object Detection
ECCV 2018
·
20 Mar 2018
·
arxiv:1803.07066
2017
Flow-Guided Feature Aggregation for Video Object Detection
ICCV 2017
·
21 Aug 2017
·
arxiv:1703.10025
Deep Feature Flow for Video Recognition
CVPR 2017
·
06 Jun 2017
·
arxiv:1611.07715
Deformable Convolutional Networks
ICCV 2017 (Oral)
·
06 Jun 2017
·
arxiv:1703.06211
Fully Convolutional Instance-aware Semantic Segmentation
CVPR 2017 (Spotlight)
·
11 Apr 2017
·
arxiv:1611.07709
2016
R-FCN: Object Detection via Region-based Fully Convolutional Networks
NeurIPS 2016
·
05 Dec 2016
·
arxiv:1605.06409
Instance-aware Semantic Segmentation via Multi-task Network Cascades
CVPR 2016 (Oral)
·
26 Jun 2016
·
arxiv:1512.04412
ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation
CVPR 2016 (Oral)
·
19 Apr 2016
·
arxiv:1604.05144
Instance-sensitive Fully Convolutional Networks
ECCV 2016
·
30 Mar 2016
·
arxiv:1603.08678
2015
Convolutional Feature Masking for Joint Object and Stuff Segmentation
CVPR 2015
·
08 Jun 2015
·
arxiv:1412.1283
BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation
ICCV 2015
·
19 May 2015
·
arxiv:1503.01640
Generative Modeling of Convolutional Neural Networks
ICLR 2015
·
10 Apr 2015
·
arxiv:1412.6296