Attention-Beam-Image-Captioning

We present a heuristic of beam search on top of the encoder-decoder based architecture that gives better quality captions on three benchmark datasets: Flickr8k, Flickr30k and MS COCO.

Beam search helps in finding the most optimal caption that can be generated by the model instead of greedily choosing the word with best score at each decoding step. Following shows how a beam width (k) of 3 helps in generating better captions:

Dependencies

For dependencies related to this project, environment.yml and requirements.txt files have been provided.

To install the dependencies using conda:

conda env create -f environment.yml
conda env list

Training

Reference data folder and annotations json file for the downloaded dataset (MSCOCO, Flickr8k, Flickr30k) in create_input_files.py and run the python script to create the required dataset.

To train a model run python train.py. All training hyper-parameters are mentioned in train.py.

Note: Pretrained models for MSCOCO, Flickr8k, Flickr30k can be downloaded from here.

The downloaded zip file needs to be extracted in the models/ directory.

Testing / Inference

You may use caption.py to generate image captions and attention map over an image.

python caption.py --img='path/to/image.jpeg' --model='path/to/BEST_checkpoint_coco_5_cap_per_img_5_min_word_freq.pth.tar' --word_map='path/to/WORDMAP_coco_5_cap_per_img_5_min_word_freq.json' --beam_size=5

The Jupyter Notebook Caption-Sample-Images.ipynb can be used to caption specified images using the trained model.
Generate-Testset-Predictions.ipynb is used for generating predictions in the required format for the testing dataset.

Results

Intercative User Interface

To use the UI based image captioner module run the following commands:

cd ui/
python MainWindowUI.py

This would open the following user interface:

Project UI Demo

You can find the demo video here on youtube.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Attention-Beam-Image-Captioning

Dependencies

Training

Testing / Inference

Results

Intercative User Interface

Project UI Demo

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
img		img
models		models
ui		ui
.gitignore		.gitignore
Caption-Sample-Images.ipynb		Caption-Sample-Images.ipynb
Generate-Testset-Predictions.ipynb		Generate-Testset-Predictions.ipynb
README.md		README.md
caption.py		caption.py
create_input_files.py		create_input_files.py
datasets.py		datasets.py
environment.yml		environment.yml
eval.py		eval.py
models.py		models.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

anubhavshrimal/Attention-Beam-Image-Captioning

Folders and files

Latest commit

History

Repository files navigation

Attention-Beam-Image-Captioning

Dependencies

Training

Testing / Inference

Results

Intercative User Interface

Project UI Demo

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages