GL-RG: Global-Local Representation Granularity for Video Captioning

GL-RG exploit extensive vision representations from different video ranges to improve linguistic expression. We devise a novel global-local encoder to produce rich semantic vocabulary. With our incremental training strategy, GL-RG successfully leverages the global-local vision representation to achieve fine-grained captioning on video contents. [Note] This branch includes data (>900MB) and links, for a smaller version please goto min-branch/for-review (52.5MB).

Dependencies

Python 2.7
Pytorch 0.2 or 1.0
Microsoft COCO Caption Evaluation
CIDEr
numpy, scikit-image, h5py, requests

This repo was tested with Python 2.7, PyTorch 0.2.0 (1.0.1), cuDNN 6.0 (10.0), and CUDA 8.0. But it should be runnable with more recent PyTorch>=1.0 (or >=0.2, <=1.0) versions.

You can use anaconda or miniconda to install the dependencies:

conda create -n GL-RG-pytorch python=2.7 pytorch=0.2 scikit-image h5py requests
conda activate GL-RG-pytorch

Installation

First clone the this repository to any location using --recursive:

git clone --recursive https://github.com/goodproj13/GL-RG.git

Check out the coco-caption/, cider/, data/ and model/ projects into your working directory. If not, please find detailed steps INSTALL.md for installation and dataset preparation.

Please run following script to download Stanford CoreNLP 3.6.0 models to coco-caption/:

cd coco-caption
./get_stanford_models.sh

Model Zoo

Model	Dataset	Exp.	B@4	M	R	C	Download Link
GL-RG	MSR-VTT	XE	45.5	30.1	62.6	51.2	GL-RG_XE_msrvtt
GL-RG	MSR-VTT	DXE	46.9	30.4	63.9	55.0	GL-RG_DXE_msrvtt
GL-RG + IT	MSR-VTT	DR	46.9	31.2	65.7	60.6	GL-RG_DR_msrvtt
GL-RG	MSVD	XE	52.3	33.8	70.4	58.7	GL-RG_XE_msvd
GL-RG	MSVD	DXE	57.7	38.6	74.9	95.9	GL-RG_DXE_msvd
GL-RG + IT	MSVD	DR	60.5	38.9	76.4	101.0	GL-RG_DR_msvd

Test

Check out the trained model weights under the model/ directory (following Installation) and run:

./test.sh

Note: Please modify MODEL_NAME, EXP_NAME and DATASET in test.sh if experiment setting changes. For more details please refer to TEST.md.

License

GL-RG is released under the MIT license.

Acknowledgements

We are truly thankful of the following prior efforts in terms of knowledge contributions and open-source repos.

SA-LSTM: Describing Videos by Exploiting Temporal Structure (ICCV'15) [paper] [implement code]
RecNet: Reconstruction Network for Video Captioning (CVPR'18) [paper] [official code]
SAAT: Syntax-Aware Action Targeting for Video Captioning (CVPR'20) [paper] [official code]

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
Figs		Figs
cider @ be3db8a		cider @ be3db8a
coco-caption @ 3a9afb2		coco-caption @ 3a9afb2
data		data
docs		docs
model		model
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
dataloader.py		dataloader.py
model.py		model.py
opts.py		opts.py
test.py		test.py
test.sh		test.sh
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GL-RG: Global-Local Representation Granularity for Video Captioning

Dependencies

Installation

Model Zoo

Test

License

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

goodproj13/GL-RG

Folders and files

Latest commit

History

Repository files navigation

GL-RG: Global-Local Representation Granularity for Video Captioning

Dependencies

Installation

Model Zoo

Test

License

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages