Skip to content

Latest commit

 

History

History
265 lines (174 loc) · 12.7 KB

FEATURES.md

File metadata and controls

265 lines (174 loc) · 12.7 KB

Supported features - Training

Teacher forcing scheduling

Teacher forcing in RNNs refers to feeding ground truth token at time step t. This paper details an approach where at each time step t ground truth is fed with probability P(teacher_forcing) and the token generated by the network itself with probability 1 - P(teacher_forcing). Currently inverse sigmoid sampling schedule is implemented for the above probabilities.

To train a model with sampled teacher forcing using k=2200 and beta=0.3 run the following:

python train.py --dataset coco:train2014 --model_name my_model --teacher_forcing sampled --teacher_forcing_k 2200 --teaacher_forcing_beta 0.3

This inverse sigmoid scheduling implementation depends on the parameter k which is usually in the order of 1000s and can be interpreted as "how soon do we want to start decreasing the probability of teacher forcing?" and parameter beta which between 0 and 1 and can be interpreted as "once we start to use model's own outputs, how fast do we want the rate of model outputs usage to increase?", intuitively this is the slope of the middle segment of the inverse sigmoid curve.

Teacher forcing is controlled by parameter --teacher_forcing. By default this is set to always, meaning that we don't perform any sampling. Other options are sampled - using a sampling procedure outlined above; and additive - deterministic summation of teacher token with generated token with weights determined by the inverse sigmoid scheduler.

--teacher_forcing_k sets the value of k and --teacher_forcing_beta sets the value for beta.

Hierarchical model

This repository contains and implementation of hierarchical model inspired by Krause et al. You can train the hierarchical model on for example Visual Genome paragraph data as well as COCO captions (where each caption is considered a paragraph containing single sentence.)

./train.py --dataset vgim2p:train --validate vgim2p:val --vocab AUTO --model_name Hierarchical/basic_hierarchical_model --hierarchical_model

Model evaluation

COCO evaluation library works only with Python 2. Therefore you will need to make sure that you are running the below code in an environment that supports this.

  1. Link to coco-caption library:
git clone https://github.com/tylin/coco-caption ~/workdir/coco-caption
cd image_captioning
ln -s path/to/coco-caption/pycocoevalcap datasets/
  1. Install PyCocoTools for Python2 (your Python2 environment may requires loading modules):
module purge
module load python-env/2.7.10
pip2 install pycocotools --user
  1. modify the file pycocoevalcap/eval.py to remove other metrics other than METEOR and CIDEr, on lines 39-45.

Once the setup steps above are done, you can perform evaluation on a json file that corresponds to one particular model:

python eval_coco.py path/to/result/captions.json --ground_truth datasets/data/COCO/annotations/captions_val2014.json

By default the above command creates a file containing METEOR and CIDEr score in JSON formatted file having an extension *.eval

Finally, to simplify generating user readable output, an eval2csv.py script combines multiple *.eval files into a single, easy to parse and read CSV file:

python eval2csv.py --evaluations_dir path/containing/eval_files --output_file output_file.csv 

Feature Extraction

You can use extract_dataset_features.py to extract features from one of the convolutional models made available in models.py. Currently the following CNN models from PyTorch torchvision are supported alexnet, Densenet 20, Resnet-152, VGG-16, and Inception V3, all trained on ImageNet classification task. The exctracted features are either taken from the already flattened pre-classification layer, or by flattening the final convolutional or pooling layer.

The resulting features are saved using lmdb file format. Example command for generating features computed from images in MS-COCO training and validation sets using ResNet-152 CNN:

python extract_dataset_features.py --dataset coco:train2014+coco:val2014 --extractor resnet152

Feature extraction script currently supports the feature types specified by --feature_type:

  • plain - takes an input image, resizes it and calculates features without any augmentation
  • avg - takes 5 different crops of a resized input image - 4 corners + center, and then flips each crop horizontally, producing in total 10 cropped images. These images are then processed by the specified CNN separately, and the resulting single feature vector output produced by the feature extractor is formed by applying elementwise avareging over 10 feature vectors
  • max - same as avg, but using elementwise maximum

Three different pixel value normalization strategies are currently supported for avg and max feature types. Normalization is specified by --normalize parameter:

  • default - applies per-channel normalization settings recommended by PyTorch
  • skip - do not normalize pixel values
  • substract_half - subtract 0.5 from each pixel value, after the pixel values have been converted to be between 0 and 1.

Feature extractor supports the same dataset configuration format as the train.py and infer.py scripts.

DenseCap features

Dense Captioning features are extracted from DenseCap repository using LuA Torch.

Installing LuA Torch

The following instructions are for CSC Taito cluster. Make sure that you run these commands in the GPU environment (interative shell) with K80 GPU selected! (Running this on P100 fails)

First, purge the environment load the needed modules:

cd $USERAPPL  
module purge   
module load gcc/4.9.3 mkl/11.3.0 intelmpi/5.1.1 fftw/3.3.4 hdf5-serial/1.8.15 cuda/7.5

Clone the LuA torch:

git clone https://github.com/torch/distro.git ./torch --recursive

Compile and install:

cd torch 
./clean.sh 
export CMAKE_LIBRARY_PATH=/appl/opt/mkl/11.3.0/compilers_and_libraries_2016.0.109/linux/mkl/lib/intel64_lin:/appl/opt/fftw/gcc-4.9.3/intelmpi-5.1.1/fftw-3.3.4/lib:/appl/vis/sox/14.4.2-n/lib 
export CMAKE_INCLUDE_PATH=/appl/opt/mkl/11.3.0/compilers_and_libraries_2016.0.109/linux/mkl/include:/appl/opt/fftw/gcc-4.9.3/intelmpi-5.1.1/fftw-3.3.4/include:/appl/vis/sox/14.4.2-n/include  
export CXX=g++  
export CC=gcc  
./install.sh

Answer "NO" to the following question:

Do you want to automatically prepend the Torch install location to PATH and LD_LIBRARY_PATH in your /homeappl/home/jppirhon/.bashrc? (yes/no) [yes] >>> no

If all went well installation is now done.

You can now test the installation by first initializing the Torch environment:

source $USERAPPL/torch/install/bin/torch-activate

and trying to start Torch shell:

th

Installing DenseCap model

Clone the DenseCap repository:

cd $USERAPPL   
git clone https://github.com/jcjohnson/densecap

Install the dependencies listed in the README.md

Fetch the pretrained model

cd densecap   
sh scripts/download_pretrained_model.sh

Run a test command inside the densecap folder:

th run_model.lua -input_image imgs/elephant.jpg

If things were well, the vis/data/ directory should have new json file with dense captioning output for the elephant.jpg image.

Extracting DenseCap features

Before we are ready to extract the features we need to prepare a list of files containing the paths to the images that we need the features for. To do this, go to image_captioning directory and run the following script. Below is the example for MS COCO:

python3 list_dataset_files.py --dataset coco:train2014:no_resize+coco:val2014:no_resize --num_workers 4

In practice --num_files 10 parameter can be used with the above command splits the file list into 10 files, to make it possible to parallelize DenseCap feature extraction

If your run the above command on Taito environment, it should have created a file: image_file_list-coco:train2014+coco:val2014-taito-gpu.csc.fi.txt (the last part of file name will vary based on environment)

If all of this worked you are now ready to extract the features. Features are extracted using the handy extract_features.lua script provided in the repo.

Now, to run feature extractor on a single file do the following:

cd ../densecap   
th extract_features.lua -boxes_per_image 50 -input_txt ../image_captioning_dev/image_file_list-coco:train2014+coco:val2014-taito-gpu.csc.fi.txt -output_h5 densecap_features-coco:train2014+coco:val2014.h5

The extract_features.lua script takes the following mandatory parameters:

  • -input_txt New line separated text file listing image paths to images for which we need to extract features
  • -output_h5 Path to HDF5 output file

The following parameters have default values, so they may not always be specified, however for our purposes some of them need to be changed:

  • -boxes_per_image defaults to 100 - we can set this to 50 to match the papers we are replicating.
  • -gpu defaults to 0, which GPU device to use

Other default parameters are:

  • -image_size defaults to 720 - the dimension to which the image is resized before densecaptions are extracted keep it as it is, setting this to lower value may result in not enough regions being detected (LuA Torch model fails to handle these cases correctly)
  • -checkpoint defaults to data/models/densecap/densecap-pretrained-vgg16.t7 which is a pretrained model we fetched earlier.
  • -rpn_nms_thresh defaults to 0.7
  • -final_nms_thresh defaults to 0.4
  • -num_proposals defaults to 1000
  • -max_images defaults to 0

Taito / CSC only

Doing this as SLURM batch/array job on 10 files on Taito would look like this:

Extract file list -

python3 list_dataset_files.py --dataset coco:train2014:no_resize+coco:val2014:no_resize --num_workers 4 --num_files 10

Run feature extraction as SLURM array job (please take note that the range of array job needs to be set to 0 to num_files - 1):

sbatch --time=0-24 --mem=128GB --job-name='COCO_TO_DENSECAP' --array=0-9 -o slurm-%x-%A_%a.out scripts/extract_densecap_features.sh
'../image_captioning_dev/file_lists/image_file_list-coco:train2014:no_resize+coco:val2014:no_resize-taito-gpu.csc.fi_${n}_of_${N}.txt' 
'../image_captioning_dev/features/densecap_features-coco:train2014:no_resize+coco:val2014:no_resize_${n}_of_${N}.h5`

Finally, to use the extracted features, you need to convert the extracted features from H5 to LMDB file format used in DeepCaption:

./densecap_h5_to_lmdb.py --inputs_list_basename 'file_lists/image_file_list-coco:train2014:no_resize+coco:val2014:no_resize-taito-gpu.csc.fi' \
--features_basename features/densecap_features-coco:train2014:no_resize+coco:val2014:no_resize --num_files 10

Please look at scripts/extract_densecap_features.sh to see what the above command really does.

Troubleshooting

If you get errors when running th commands, make sure you have first loaded the needed modules for LuA Torch (see above).

Testing

Simple tests are available in the tests/ folder. Commonly used testing functions are defined in tests/functions.sh. To run "smoke tests" on the codebase, do:

./tests/smoke_test.sh --coco_gt PATH_TO_COCO_CAPTIONS_GROUND_TRUTH

You can generally add --skip_long_commands parameter if you want to skip running the tests that train the full model for multiple epochs. When running the tests for the first time you will get failed cases for infer.py and coco_eval.py tests that depend on the full model if the fully trained model created by the long-running test doesn't yet exist.

There are several other test scripts in the tests/ folder that work in a similar way. Tests can be run in SLURM environment using tests/submit_tests.sh helper script:

sbatch tests/submit_tests.sh tests/smoke_test.sh

Pandas/Feather dataset format

Sometimes when the JSON file containing the dataset is large, contains many different fields, and needs to be sorted and filtered in various ways when loading each data point, it may become unwieldy to do all of these things by manipulating Python dicts that we get from loading a JSON file.

For this reason, a helper file datasets/dataset_to_pandas.py allows converting any JSON file to a Pandas-readable and fast to load feather file format. At the time of writing, this feature has been used to load Visual Genome Region descriptions dataset. The feather file for the dataset has been created using the following command:

python3 datasets/dataset_to_pandas.py \
    /proj/mediaind/picsom/databases/visualgenome/download/1.2/VG/1.2/region_descriptions.json \
    --record_path regions