This is a PyTorch/GPU implementation of the paper SCL and its extension version GLSCL.
Our environment: CUDA11.3, torch1.11.0, torchvision0.12.0.
pip install -r requirements.txt
In our further experiments of the extension journal, we use a newer environment: CUDA11.7, torch2.0, Pytorch-Lightning2.0.
The pytorch and lightning version relation can be found at here.
Our pre-trained and fine-tuned model weights can be downloaded at huggingface/SCL.
Our developed cross-modal alignment benchmark can be gained at huggingface/ALIGN-BENCH.
We follow ViLT and use pyarrow
to serialize the datasets. See this link for details.
python run.py --task pretrain
The detailed settings can be found in './scl/config.py', like pretraining datasets, optimation arguments, input size.
Note that 'plugins=[MyCluster(), MyDDPPlugin()]' of pl.Trainer(run.py) is used in multi-nodes ddp training.
python run.py --task vqa/nlvr2/f30k/coco/msrvtt/lsmdc
python visualize_global.py # global-local cross-modal visualization
python visualize_local.py # local-local cross-modal visualization
python align_global.py # global-local cross-modal alignment quantify on ALIGN-BENCH
python align_local.py # local-local cross-modal alignment quantify on ALIGN-BENCH