Visual storytelling

In the space of automated captioning, the task of visual storytelling is one dimension. Given sequences of images as inputs, visual storytelling (VIST) is about automatically generating textual narratives as outputs.

This repository borrows heavily from Aalto CBIR DeepCaption codebase.

Requirements

Python 3+, Python 2.7 (for arel submodule utilizing NLP metrics)
PyTorch (v1.0+), torchvision 0.2.0+
nltk
gensim
scipy, numpy
pickle

Models

The following models are implemented:

Baseline model - https://arxiv.org/abs/1604.03968
Multi decoder model - https://arxiv.org/abs/1806.00738
GLAC model - https://arxiv.org/abs/1805.10973 (also under self-critical sequence training (SCST) objective)
AREL & GAN models - https://arxiv.org/abs/1804.09160
Character-centric storytelling model - https://arxiv.org/abs/1909.07863

Usage

This repository has the following structure:

resources
├── characters_analysis
├── configs
├── filtered_test
├── filtered_train
├── filtered_val
├── memad
├── models
├── plots
├── results
└── sis

sources
├── data
├── general
├── infer
├── models
├── scripts
└── train_validate

arel
├──

with resources/filtered_[train/val/test] holding the images of the sequences and resources/sis containing the respective annotations files of the VIST dataset.

Models available for training are as follows:

python3 sources/train_validate/baseline.py [--options]
python3 sources/train_validate/multi_decoder.py [--options]
python3 sources/train_validate/glac.py [--options]
python3 sources/train_validate/glac_sc.py [--options]
python3 sources/train_validate/baseline_cc.py [--options]
python2 arel/train_AREL.py [--options]
python2 arel/train_GAN.py [--options]

Trained checkpoints saved under resources/models/ can be used for performing inference and evaluation as follows:

python3 sources/infer/baseline.py [--options]
python3 sources/infer/multi_decoder.py [--options]
python3 sources/infer/glac.py [--options]
python3 sources/infer/baseline_cc.py [--options]
python2 arel/train_AREL.py --test [--options]
python2 arel/train_GAN.py --test [--options]

with [--options] being a collection of train/test phase model parameters and tunable hyperparameters which are documented in detail in the respective .py files.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
arel		arel
resources		resources
sources		sources
.gitignore		.gitignore
NOTICE		NOTICE
README.md		README.md
sample_result1.png		sample_result1.png
sample_result2.png		sample_result2.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visual storytelling

Requirements

Models

Usage

Sample results

About

Releases

Packages

Languages

aalto-cbir/visual-storytelling

Folders and files

Latest commit

History

Repository files navigation

Visual storytelling

Requirements

Models

Usage

Sample results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages