DeepCaption

DeepCaption is a framework for image captioning research using deep learning. The code is based on the image captioning tutorial by yunjey but has been extensively expanded since then.

The goal of image captioning is to convert a given input image into a natural language description. The encoder-decoder framework is widely used for this task. The image encoder is a convolutional neural network (CNN). Baseline code uses resnet-152 model pretrained on the ILSVRC-2012-CLS image classification dataset. The decoder is a long short-term memory (LSTM) recurrent neural network.

Features

DeepCaption supports many features, including:

external pre-calculated features stored in numpy, lmdb or PicSOM bin format
persistent features (features input at each RNN iteration)
soft attention
teacher forcing scheduling

Some of the advanced features are documented on the separate features documentation page.

Usage

1. Clone the repository

To get the latest release:

git clone https://github.com/aalto-cbir/DeepCaption

or to get the internal development version:

git clone https://version.aalto.fi/gitlab/CBIR/DeepCaption.git

2. Setup dataset for training

For example if you have downloaded the COCO dataset, you might have the images under /path/to/coco/images and annotations in /path/to/coco/annotations.

First we resize the images to 256x256. This is just to speed up the training process.

./resize.py --image_dir /path/to/coco/images/train2014 --output_dir /path/to/coco/images/train2014_256x256
./resize.py --image_dir /path/to/coco/images/val2014 --output_dir /path/to/coco/images/val2014_256x256

Next, we need to set up the dataset configuration. Create a file datasets/datasets.conf with the following contents:

[coco]
dataset_class = CocoDataset
root_dir = /path/to/coco

[coco:train2014]
image_dir = images/train2014_256x256
caption_path = annotations/captions_train2014.json

[coco:val2014]
image_dir = images/val2014_256x256
caption_path = annotations/captions_val2014.json

Now we can build the vocabulary:

./build_vocab.py --dataset coco:train2014 --vocab_output_path vocab.pkl

3. Train a model

Example of training a single model with default parameters on COCO dataset:

./train.py --dataset coco:train2014 --vocab vocab.pkl --model_name mymodel

or if you wish to follow validation set metrics:

./train.py --dataset coco:train2014 --vocab vocab.pkl --model_name mymodel --validate coco:val2014 --validation_scoring cider

You can plot the training and validation loss and other statistics using the following command:

./plot_stats.py models/mymodel/train_stats.json

By adding --watch you can have it update the plot automatically every time there are new numbers (typically after each epoch).

4. Infer from your model

Now you can use your model to generate a caption to any random image:

./infer.py --model models/mymodel/ep5.model --print_results random_image.jpg

or a directory of any random images:

./infer.py --model models/mymodel/ep5.model --print_results --image_dir random_image_dir/

You can also do inference on any configured dataset:

./infer.py --model models/mymodel/ep5.model --dataset coco:val2014

You can add e.g., --scoring cider to automatically calculate scoring metrics if a ground truth has been defined for that dataset.

Inference also supports the following flags:

--max_seq_length - maximum length of decoded caption (in words)
--no_repeat_sentences - remove repeating sentences if they occur immediately after each other
--only_complete_senteces - remove the last sentence if it does not end with a period (and thus is likely to be truncated)

Misc

Project structure

We are trying to maintain a standard project structure. One can be referred to this template for future development.

Vocabulary precomputation for self-critical training

If self-critic loss is going to be used, CIDEr-D precomputation of n-grams needs to be done in order to speed up the training. Please see scripts/preprocess_ngrams.py. It needs a dataset, a preprocessed vocabulary and the name of the output precomputations.

An usage example:

python scripts/preprocess_ngrams.py --dataset picsom:COCO:train2014+picsom:tgif:imageset --vocab ../vocab-coco.pkl --output ngrams_precomputed.pkl

This is then passed to train.py as:

train.py --... --self_critical_loss sc --validation_scoring ciderd --cached_words ngrams_precomputed.pkl

Name		Name	Last commit message	Last commit date
Latest commit History 780 Commits
datasets		datasets
eval		eval
model		model
picsom		picsom
scripts		scripts
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
FEATURES.md		FEATURES.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
args.py		args.py
build_vocab.py		build_vocab.py
dataset.py		dataset.py
datasets_compare.py		datasets_compare.py
densecap_h5_to_lmdb.py		densecap_h5_to_lmdb.py
eval2csv.py		eval2csv.py
eval_coco.py		eval_coco.py
extract_dataset_features.py		extract_dataset_features.py
extract_image_features.sh		extract_image_features.sh
feature_extractor.py		feature_extractor.py
image_feature_extractor.md		image_feature_extractor.md
infer.py		infer.py
list_dataset_files.py		list_dataset_files.py
model_info.py		model_info.py
model_update.py		model_update.py
plot_stats.py		plot_stats.py
requirements.txt		requirements.txt
resize.py		resize.py
sample.py		sample.py
train.py		train.py
utils.py		utils.py
view_prof.py		view_prof.py
visualize_attention.py		visualize_attention.py
vocabulary.py		vocabulary.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepCaption

Features

Usage

1. Clone the repository

2. Setup dataset for training

3. Train a model

4. Infer from your model

Misc

Project structure

Vocabulary precomputation for self-critical training

About

Releases

Packages

Contributors 5

Languages

License

aalto-cbir/DeepCaption

Folders and files

Latest commit

History

Repository files navigation

DeepCaption

Features

Usage

1. Clone the repository

2. Setup dataset for training

3. Train a model

4. Infer from your model

Misc

Project structure

Vocabulary precomputation for self-critical training

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages