Teacher forcing in RNNs refers to feeding ground truth token at time step t
. This paper details an approach where at each time step t
ground truth is fed with probability P(teacher_forcing)
and the token generated by the network itself with probability 1 - P(teacher_forcing)
. Currently inverse sigmoid sampling schedule is implemented for the above probabilities.
To train a model with sampled teacher forcing using k=2200
and beta=0.3
run the following:
python train.py --dataset coco:train2014 --model_name my_model --teacher_forcing sampled --teacher_forcing_k 2200 --teaacher_forcing_beta 0.3
This inverse sigmoid scheduling implementation depends on the parameter k
which is usually in the order of 1000s
and can be interpreted as "how soon do we want to start decreasing the probability of teacher forcing?" and parameter beta
which between 0
and 1
and can be interpreted as "once we start to use model's own outputs, how fast do we want the rate of model outputs usage to increase?", intuitively this is the slope of the middle segment of the inverse sigmoid curve.
Teacher forcing is controlled by parameter --teacher_forcing
. By default this is set to always
, meaning that we don't perform any sampling. Other options are sampled
- using a sampling procedure outlined above; and additive
- deterministic summation of teacher token with generated token with weights determined by the inverse sigmoid scheduler.
--teacher_forcing_k
sets the value of k
and --teacher_forcing_beta
sets the value for beta
.
This repository contains and implementation of hierarchical model inspired by Krause et al. You can train the hierarchical model on for example Visual Genome paragraph data as well as COCO captions (where each caption is considered a paragraph containing single sentence.)
./train.py --dataset vgim2p:train --validate vgim2p:val --vocab AUTO --model_name Hierarchical/basic_hierarchical_model --hierarchical_model
COCO evaluation library works only with Python 2. Therefore you will need to make sure that you are running the below code in an environment that supports this.
- Link to coco-caption library:
git clone https://github.com/tylin/coco-caption ~/workdir/coco-caption
cd image_captioning
ln -s path/to/coco-caption/pycocoevalcap datasets/
- Install PyCocoTools for Python2 (your Python2 environment may requires loading modules):
module purge
module load python-env/2.7.10
pip2 install pycocotools --user
- modify the file
pycocoevalcap/eval.py
to remove other metrics other than METEOR and CIDEr, on lines 39-45.
Once the setup steps above are done, you can perform evaluation on a json file that corresponds to one particular model:
python eval_coco.py path/to/result/captions.json --ground_truth datasets/data/COCO/annotations/captions_val2014.json
By default the above command creates a file containing METEOR and CIDEr score in JSON formatted file having an extension *.eval
Finally, to simplify generating user readable output, an eval2csv.py
script combines multiple *.eval
files into a single, easy to parse and read CSV file:
python eval2csv.py --evaluations_dir path/containing/eval_files --output_file output_file.csv
You can use extract_dataset_features.py
to extract features from one of the convolutional models made available in models.py
. Currently the following CNN models from PyTorch torchvision
are supported alexnet
, Densenet 20
, Resnet-152
, VGG-16
, and Inception V3
, all trained on ImageNet classification task. The exctracted features are either taken from the already flattened pre-classification layer, or by flattening the final convolutional or pooling layer.
The resulting features are saved using lmdb
file format. Example command for generating features computed from images in MS-COCO training and validation sets using ResNet-152 CNN:
python extract_dataset_features.py --dataset coco:train2014+coco:val2014 --extractor resnet152
Feature extraction script currently supports the feature types specified by --feature_type
:
- plain - takes an input image, resizes it and calculates features without any augmentation
- avg - takes 5 different crops of a resized input image - 4 corners + center, and then flips each crop horizontally, producing in total 10 cropped images. These images are then processed by the specified CNN separately, and the resulting single feature vector output produced by the feature extractor is formed by applying elementwise avareging over 10 feature vectors
- max - same as avg, but using elementwise maximum
Three different pixel value normalization strategies are currently supported for avg
and max
feature types. Normalization is specified by --normalize
parameter:
- default - applies per-channel normalization settings recommended by PyTorch
- skip - do not normalize pixel values
- substract_half - subtract
0.5
from each pixel value, after the pixel values have been converted to be between0
and1
.
Feature extractor supports the same dataset configuration format as the train.py
and infer.py
scripts.
Dense Captioning features are extracted from DenseCap repository using LuA Torch.
The following instructions are for CSC Taito cluster. Make sure that you run these commands in the GPU environment (interative shell) with K80
GPU selected! (Running this on P100
fails)
First, purge the environment load the needed modules:
cd $USERAPPL
module purge
module load gcc/4.9.3 mkl/11.3.0 intelmpi/5.1.1 fftw/3.3.4 hdf5-serial/1.8.15 cuda/7.5
Clone the LuA torch:
git clone https://github.com/torch/distro.git ./torch --recursive
Compile and install:
cd torch
./clean.sh
export CMAKE_LIBRARY_PATH=/appl/opt/mkl/11.3.0/compilers_and_libraries_2016.0.109/linux/mkl/lib/intel64_lin:/appl/opt/fftw/gcc-4.9.3/intelmpi-5.1.1/fftw-3.3.4/lib:/appl/vis/sox/14.4.2-n/lib
export CMAKE_INCLUDE_PATH=/appl/opt/mkl/11.3.0/compilers_and_libraries_2016.0.109/linux/mkl/include:/appl/opt/fftw/gcc-4.9.3/intelmpi-5.1.1/fftw-3.3.4/include:/appl/vis/sox/14.4.2-n/include
export CXX=g++
export CC=gcc
./install.sh
Answer "NO" to the following question:
Do you want to automatically prepend the Torch install location to PATH and LD_LIBRARY_PATH in your /homeappl/home/jppirhon/.bashrc? (yes/no) [yes] >>> no
If all went well installation is now done.
You can now test the installation by first initializing the Torch environment:
source $USERAPPL/torch/install/bin/torch-activate
and trying to start Torch shell:
th
Clone the DenseCap repository:
cd $USERAPPL
git clone https://github.com/jcjohnson/densecap
Install the dependencies listed in the README.md
Fetch the pretrained model
cd densecap
sh scripts/download_pretrained_model.sh
Run a test command inside the densecap
folder:
th run_model.lua -input_image imgs/elephant.jpg
If things were well, the vis/data/
directory should have new json
file with dense captioning output for the elephant.jpg
image.
Before we are ready to extract the features we need to prepare a list of files containing the paths to the images that we need the features for. To do this, go to image_captioning
directory and run the following script. Below is the example for MS COCO:
python3 list_dataset_files.py --dataset coco:train2014:no_resize+coco:val2014:no_resize --num_workers 4
In practice --num_files 10
parameter can be used with the above command splits the file list into 10 files, to make it possible to parallelize DenseCap feature extraction
If your run the above command on Taito environment, it should have created a file:
image_file_list-coco:train2014+coco:val2014-taito-gpu.csc.fi.txt
(the last part of file name will vary based on environment)
If all of this worked you are now ready to extract the features. Features are extracted using the handy extract_features.lua
script provided in the repo.
Now, to run feature extractor on a single file do the following:
cd ../densecap
th extract_features.lua -boxes_per_image 50 -input_txt ../image_captioning_dev/image_file_list-coco:train2014+coco:val2014-taito-gpu.csc.fi.txt -output_h5 densecap_features-coco:train2014+coco:val2014.h5
The extract_features.lua
script takes the following mandatory parameters:
-input_txt
New line separated text file listing image paths to images for which we need to extract features-output_h5
Path to HDF5 output file
The following parameters have default values, so they may not always be specified, however for our purposes some of them need to be changed:
-boxes_per_image
defaults to 100 - we can set this to 50 to match the papers we are replicating.-gpu
defaults to 0, which GPU device to use
Other default parameters are:
-image_size
defaults to 720 - the dimension to which the image is resized before densecaptions are extracted keep it as it is, setting this to lower value may result in not enough regions being detected (LuA Torch model fails to handle these cases correctly)-checkpoint
defaults to data/models/densecap/densecap-pretrained-vgg16.t7 which is a pretrained model we fetched earlier.-rpn_nms_thresh
defaults to 0.7-final_nms_thresh
defaults to 0.4-num_proposals
defaults to 1000-max_images
defaults to 0
Doing this as SLURM batch/array job on 10 files on Taito would look like this:
Extract file list -
python3 list_dataset_files.py --dataset coco:train2014:no_resize+coco:val2014:no_resize --num_workers 4 --num_files 10
Run feature extraction as SLURM array job (please take note that the range of array job needs to be set to 0 to num_files - 1
):
sbatch --time=0-24 --mem=128GB --job-name='COCO_TO_DENSECAP' --array=0-9 -o slurm-%x-%A_%a.out scripts/extract_densecap_features.sh
'../image_captioning_dev/file_lists/image_file_list-coco:train2014:no_resize+coco:val2014:no_resize-taito-gpu.csc.fi_${n}_of_${N}.txt'
'../image_captioning_dev/features/densecap_features-coco:train2014:no_resize+coco:val2014:no_resize_${n}_of_${N}.h5`
Finally, to use the extracted features, you need to convert the extracted features from H5 to LMDB file format used in DeepCaption:
./densecap_h5_to_lmdb.py --inputs_list_basename 'file_lists/image_file_list-coco:train2014:no_resize+coco:val2014:no_resize-taito-gpu.csc.fi' \
--features_basename features/densecap_features-coco:train2014:no_resize+coco:val2014:no_resize --num_files 10
Please look at scripts/extract_densecap_features.sh
to see what the above command really does.
If you get errors when running th
commands, make sure you have first loaded the needed modules for LuA Torch (see above).
Simple tests are available in the tests/
folder. Commonly used testing functions are defined in tests/functions.sh
. To run "smoke tests" on the codebase, do:
./tests/smoke_test.sh --coco_gt PATH_TO_COCO_CAPTIONS_GROUND_TRUTH
You can generally add --skip_long_commands
parameter if you want to skip running the tests that train the full model for multiple epochs. When running the tests for the first time you will get failed cases for infer.py
and coco_eval.py
tests that depend on the full model if the fully trained model created by the long-running test doesn't yet exist.
There are several other test scripts in the tests/
folder that work in a similar way. Tests can be run in SLURM
environment using tests/submit_tests.sh
helper script:
sbatch tests/submit_tests.sh tests/smoke_test.sh
Sometimes when the JSON file containing the dataset is large, contains many different fields, and needs to be sorted and filtered in various ways when loading each data point, it may become unwieldy to do all of these things by manipulating Python dicts that we get from loading a JSON file.
For this reason, a helper file datasets/dataset_to_pandas.py
allows converting any JSON file to a Pandas-readable and fast to load feather
file format. At the time of writing, this feature has been used to load Visual Genome Region descriptions dataset. The feather
file for the dataset has been created using the following command:
python3 datasets/dataset_to_pandas.py \
/proj/mediaind/picsom/databases/visualgenome/download/1.2/VG/1.2/region_descriptions.json \
--record_path regions