Image Captioning

This project uses CNNs and LSTMs to automatically generate captions from images.

Data

The Microsoft Common Objects in Context (MS COCO) dataset is a large-scale dataset for scene understanding. The dataset is commonly used to train and benchmark object detection, segmentation, and captioning algorithms.

You can read more about the dataset on the website or in the research paper.

To obtain and explore the dataset, you can use the COCO API.

Model Architecture

The core architecture is an encoder-decoder, where the encoder is a pretrained ResNet CNN on ImageNet, and the decoder is a basic LSTM.

Instructions

Clone this repo: https://github.com/cocodataset/cocoapi

git clone https://github.com/cocodataset/cocoapi.git

Setup the coco API (also described in the readme here)

cd cocoapi/PythonAPI  
make  
cd ..

Download some specific data from here: http://cocodataset.org/#download (described below)

Under Annotations, download:
- 2014 Train/Val annotations [241MB] (extract captions_train2014.json and captions_val2014.json, and place at locations cocoapi/annotations/captions_train2014.json and cocoapi/annotations/captions_val2014.json, respectively)
- 2014 Testing Image info [1MB] (extract image_info_test2014.json and place at location cocoapi/annotations/image_info_test2014.json)
Under Images, download:
- 2014 Train images [83K/13GB] (extract the train2014 folder and place at location cocoapi/images/train2014/)
- 2014 Val images [41K/6GB] (extract the val2014 folder and place at location cocoapi/images/val2014/)
- 2014 Test images [41K/6GB] (extract the test2014 folder and place at location cocoapi/images/test2014/)

The project is structured as a series of Jupyter notebooks that are designed to be completed in sequential order (0_Dataset.ipynb, 1_Preliminaries.ipynb, 2_Training.ipynb, 3_Inference.ipynb).

This is a project from Udacity's computer vision nanodegree

LICENSE: This project is licensed under the terms of the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.ipynb_checkpoints		.ipynb_checkpoints
images		images
models		models
.gitignore		.gitignore
0_Dataset.ipynb		0_Dataset.ipynb
1_Preliminaries.ipynb		1_Preliminaries.ipynb
2_Training.ipynb		2_Training.ipynb
3_Inference.ipynb		3_Inference.ipynb
LICENSE		LICENSE
README.md		README.md
data_loader.py		data_loader.py
desktop.ini		desktop.ini
model.py		model.py
requirements.txt		requirements.txt
vocabulary.py		vocabulary.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Captioning

Data

Model Architecture

Instructions

About

Releases

Packages

Languages

License

amin-asdzdh/Image_Captioning

Folders and files

Latest commit

History

Repository files navigation

Image Captioning

Data

Model Architecture

Instructions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages