Introduction

This repository is the official implementation of Contextual Transformer Networks for Visual Recognition.

CoT is a unified self-attention building block, and acts as an alternative to standard convolutions in ConvNet. As a result, it is feasible to replace convolutions with their CoT counterparts for strengthening vision backbones with contextualized self-attention.

2021/3/25-2021/6/5: CVPR 2021 Open World Image Classification Challenge

Rank 1 in Open World Image Classification Challenge @ CVPR 2021. (Team name: VARMS)

Usage

The code is mainly based on timm.

Requirement:

PyTorch 1.8.0+
Python3.7
CUDA 10.1+
CuPy.

Clone the repository:

git clone https://github.com/JDAI-CV/CoTNet.git

Train

First, download the ImageNet dataset. To train CoTNet-50 on ImageNet on a single node with 8 gpus for 350 epochs run:

python -m torch.distributed.launch --nproc_per_node=8 train.py --folder ./experiments/cot_experiments/CoTNet-50-350epoch

The training scripts for CoTNet (e.g., CoTNet-50) can be found in the cot_experiments folder.

Inference Time vs. Accuracy

CoTNet models consistently obtain better top-1 accuracy with less inference time than other vision backbones across both default and advanced training setups. In a word, CoTNet models seek better inference time-accuracy trade-offs than existing vision backbones.

Results on ImageNet

name	resolution	#params	FLOPs	Top-1 Acc.	Top-5 Acc.	model
CoTNet-50	224	22.2M	3.3	81.3	95.6	GoogleDrive / Baidu
CoTNeXt-50	224	30.1M	4.3	82.1	95.9	GoogleDrive / Baidu
SE-CoTNetD-50	224	23.1M	4.1	81.6	95.8	GoogleDrive / Baidu
CoTNet-101	224	38.3M	6.1	82.8	96.2	GoogleDrive / Baidu
CoTNeXt-101	224	53.4M	8.2	83.2	96.4	GoogleDrive / Baidu
SE-CoTNetD-101	224	40.9M	8.5	83.2	96.5	GoogleDrive / Baidu
SE-CoTNetD-152	224	55.8M	17.0	84.0	97.0	GoogleDrive / Baidu
SE-CoTNetD-152	320	55.8M	26.5	84.6	97.1	GoogleDrive / Baidu

Access code for Baidu is cotn

CoTNet on downstream tasks

For Object Detection and Instance Segmentation, please see CoTNet for Object Detection and Instance Segmentation.

Citing Contextual Transformer Networks

@article{cotnet,
  title={Contextual Transformer Networks for Visual Recognition},
  author={Li, Yehao and Yao, Ting and Pan, Yingwei and Mei, Tao},
  journal={arXiv preprint arXiv:2107.12292},
  year={2021}
}

Acknowledgements

Thanks the contribution of timm and awesome PyTorch team.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
config		config
cot_experiments		cot_experiments
cupy_layers		cupy_layers
datasets		datasets
evaler		evaler
images		images
loss		loss
models		models
optim		optim
scheduler		scheduler
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

2021/3/25-2021/6/5: CVPR 2021 Open World Image Classification Challenge

Usage

Requirement:

Clone the repository:

Train

Inference Time vs. Accuracy

Results on ImageNet

CoTNet on downstream tasks

Citing Contextual Transformer Networks

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

License

JDAI-CV/CoTNet

Folders and files

Latest commit

History

Repository files navigation

Introduction

2021/3/25-2021/6/5: CVPR 2021 Open World Image Classification Challenge

Usage

Requirement:

Clone the repository:

Train

Inference Time vs. Accuracy

Results on ImageNet

CoTNet on downstream tasks

Citing Contextual Transformer Networks

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages