This repository is an official implementation of CVPR 2023 paper Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information.
By Weijie Su, Xizhou Zhu, Chenxin Tao, Lewei Lu, Bin Li, Gao Huang, Yu Qiao, Xiaogang Wang, Jie Zhou, Jifeng Dai.
Code will be available.
Maximizing Multi-modal Mutual Information Pre-training (M3I Pre-training), initially described in arxiv, is a simple yet effective one-stage pre-training paradigm. It can integrate existing pre-training methods (supervised pre-training, weakly-supervised pre-training and self-supervised pre-training) under an unified mutual information perspective and maintain all desired properties through a single-stage pre-training. Notably, we successfully pre-train a 1B model (InternImage-H) with M3I Pre-training and achieve new record 65.4 mAP
on COCO detection test-dev, 62.5 mAP
on LVIS detection minival, and 62.9 mIoU
on ADE20k.
If this work is helpful for your research, please consider citing the following BibTeX entry.
@InProceedings{Su_2023_CVPR,
author = {Su, Weijie and Zhu, Xizhou and Tao, Chenxin and Lu, Lewei and Li, Bin and Huang, Gao and Qiao, Yu and Wang, Xiaogang and Zhou, Jie and Dai, Jifeng},
title = {Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2023},
pages = {15888-15899}
}