Table of contents

Project description
Results and learnings
2.1. Initial assumptions
2.2. Dataset
2.3. Training and evaluation results
2.4. Using the model
Run sample
3.1. Setup
3.2. Train and evaluate the model
Code highlights
Use with custom dataset
5.1. Setup
5.2. Prepare data
5.3. Tag images
5.4. Download pretrained model and create mappings for custom dataset
5.5. Run training
5.6. Deploy your model

1. Project description

[back to the top]

This POC is using CNTK 2.1 to train model for multiclass classification of images. Our model is able to recognize specific objects (i.e. toilet, tap, sink, bed, lamp, pillow) connected with picture types we are looking for. It plays a big role in a process which will be used to classify pictures from different hotels and determine whether it's a picture of bathroom, bedroom, hotel front, swimming pool, bar, etc. That final classification will be made based on objects that were detected in those pictures.

What can you find inside:

How to train a multiclass classificator for images using CNTK (Cognitive Toolkit) and FasterRCNN
Training using Transfer Learning with pretrained AlexNet model
How to prepare and label images in a dataset used for training and testing the model
Working example with all the data and pretrained models

If you would like to know how to use such model, you can check this project to find out how to write a simple RESTfull, Python-based web service and deploy it to Azure Web Apps with your own model.

2. Results and learnings

[back to the top]

Disclaimer: This POC and all the learnings you can find bellow is an outcome of close cooperation between Microsoft and Hotailors. Our combined team spent total of 3 days to prepare and label data, finetune parameters and train the model.

2.1. Initial assumptions

[back to the top]

Due to limited time and human resources we decided to create this POC for just 2 of almost 20 different types of pictures we would like to classify in final product
Each type of picture (i.e. bedroom, bathroom, bar, lobby, hotel front, restaurant) can consists of different objects (i.e. toilet, sink, tap, towell, bed, lamp, curtain, pillow) which are strongly connected with that speciifc picture type.
For our POC we used 2 picture types with 4 objects/classes per each:

bedroom bathroom

pillow tap

bed sink

curtain towel

lamp toilet
At this time we focused only on detecting those specific objects for each picture type. Outcomes of evaluation should later be analyzed either by some simple algorithm or another model to match an image with one of the picture types we are looking for

2.2. Dataset

[back to the top]

We wanted to be as close as possible to real world scenarios so our dataset consists of real pictures from different hotels all over the world. Images where provided by Hotailors team
In our POC we used images scalled to max of 1000px on the wide side
Every picture usually consists of multiple types of objects we are looking for
We used total of 113 images to train and test our model from which we used:
- 82 images in positive set for training the model. We have about 50/50 split between bathroom and bedroom pictures
  
  Bathroom positive sample Bedroom positive sample
- 11 images in negative set for training the model. Those images should not contain any objects that we are interested in detecting
  
  Negative sample 1 Negative sample 2
- 20 images in testImages set for testing and evaluating the model. We have about 50/50 split between bathroom and bedroom pictures
  
  Bathroom test sample Bedroom test sample

After we tagged all of the images from HotailorPOC2 dataset we analyzed them to verify how many tagged objects per each class we have. It is suggested to use about 20-30% of all data in dataset as test data. Looking at our numbers below we did quite ok but there's still some room for improvement

object/class name	# of tagged objects in positive/train set	# of tagged objects in test set	% of tagged objects in relation to all objects
sink	46	10	18
pillow	98	27	22
toilet	34	7	17
lamp	69	18	21
curtain	78	16	17
towel	30	14	32
tap	44	9	17
bed	53	12	18

2.3. Training and evaluation results

[back to the top]

After training and evaluating our model we achieved following results:

Evaluating Faster R-CNN model for 20 images.
Number of rois before non-maximum suppression: 550
Number of rois  after non-maximum suppression: 87
AP for            sink = 0.4429
AP for          pillow = 0.1358
AP for          toilet = 0.8095
AP for            lamp = 0.5404
AP for         curtain = 0.7183
AP for           towel = 0.0000
AP for             tap = 0.1111
AP for             bed = 0.8333
Mean AP = 0.4489

As you can see above, some of the results are not too good. For example: pillow and tap average precision for test set is extremely low and for towel it even shows 0.0000 which may indicate some problems with our dataset or tagged objects. We will definitely need to look into it and check if we are able to somehow improve those results
Even though the Mean Average Precision values are not perfect we still were able to get some decent results:
Some of the results include mistakes. But those clearly look like anomalies which should be fairly easy to catch in further classification of picture type

Picture below shows how our model classified single region (yellow) as bed object although it's clearly not there:

Another picture shows how our model classified single region as towel object although it's clearly not there:
Of course sometimes there are some really ugly results which may be hard to use for further classification:

Next picture shows our model wasn't able to find any objects. We need to verify if it's because of wrongly tagged data in HotailorPOC2 or is it some kind of issue with Region Proposal Network and it simply didn't find any regions of interest for further classification

2.4. Using the model

[back to the top]

Final model will be used in form of web service running on Azure and that's why I prepared a sample RESTful web service written with Python using Flask module. This web service makes use of our trained model and provides API which takes images as an input for evaluation and returns either a cloud of tags or tagged images. Project also describes how to easily deploy this web service to Azure Web Apps with custom Python environment and required dependencies.

You can find running web service hosted on Azure Web Apps here, and project with code and deployement scripts can be found on GitHub.

Sample request and response in Postman:

3. Run sample

3.1. Setup

[back to the top]

Download content of this repo

You can either clone this repo or just download it and unzip to some folder
Setup Python environment

In order for scripts to work you should have a proper Python environment. If you don't already have it setup then you should follow one of the online tutorials. To setup Python environment and all the dependencies required by CNTK on my local Windows machine, I used scripted setup tutorial for Windows. If you're using Linux then you might want to look into one of these tutorials. Just bear in mind that this project was developed and tested with CNTK 2.1 and it wasn't tested for any other version.

Even after setting up Python environment properly you might still witness some errors when running Python scripts. Most of those errors are related to missing modules or some 3rd party frameworks and tools (i.e. GraphViz). Missing modules can be easily pip installed and most of the required ones can be found in requirements.txt files for each folder with Python scripts.

Please report if you'll find any errors or missing modules, thanks!
Download hotel pictures dataset (HotailorPOC2) and pretrained AlexNet model used for Transfer Learning

Go to Detection/FasterRCNN folder in the location were you unzipped this repo and run install_data_and_model.py. It will automatically download the HotailorPOC2 dataset, pretrained AlexNet model and will generate mapping files required to train the model.

3.2. Train and evaluate the model using HotailorPOC2 sample dataset

[back to the top]

After you go through setup steps you can start training your model.

In order to do it you need to run FasterRCNN.pyscript located in Detection/FasterRCNN.

I'm working on Windows 10 so I run the script from Anaconda Command Prompt which should be installed during setup steps.

Bear in mind that training the model might take a lot of time depending on the type of machine you are using for training and if you're using GPU or CPU.

python FasterRCNN.py

TIP: If you don't own any machine with heavy GPU you can use one of the ready to go Data Science Virtual Machine images in Azure.

When the training and evaluation will be completed, you should see something similar to this:

Evaluating Faster R-CNN model for 20 images.
Number of rois before non-maximum suppression: 550
Number of rois  after non-maximum suppression: 87
AP for            sink = 0.4429
AP for          pillow = 0.1358
AP for          toilet = 0.8095
AP for            lamp = 0.5404
AP for         curtain = 0.7183
AP for           towel = 0.0000
AP for             tap = 0.1111
AP for             bed = 0.8333
Mean AP = 0.4489

Trained model, neural network topology and evaluated images (with plotted results) can later be found in Output folder located in Detection/FasterRCNN.

4. Code highlights

[back to the top]

config.py - most of variables are set in this file

These variables are responsible for chosing a dataset that will be used to train the model. Most important variables here are :

__C.CNTK.DATASET = "HotailorPOC2"   

[..]  

if __C.CNTK.DATASET == "HotailorPOC2": #name of your dataset Must match the name set with property '__C.CNTK.DATASET'
    __C.CNTK.MAP_FILE_PATH = "../../DataSets/HotailorPOC2" # dataset directory
    __C.CNTK.NUM_TRAIN_IMAGES = 82 # number of images in 'positive' folder
    __C.CNTK.NUM_TEST_IMAGES = 20 # number of images in 'testImages' folder
    __C.CNTK.PROPOSAL_LAYER_PARAMS = "'feat_stride': 16\n'scales':\n - 4 \n - 8 \n - 12"

IMAGE_WIDTH and IMAGE_HEIGHT are used to determine the input size of images used for training and later on for evaluation:
```
__C.CNTK.IMAGE_WIDTH = 1000
__C.CNTK.IMAGE_HEIGHT = 1000
```
BASE_MODEL defines which pretrained model should be used for transfer learning. Currently we used only AlexNet. In future we want to test it with VGG16 to check if we can get better results then with AlexNet
```
__C.CNTK.BASE_MODEL = "AlexNet" # "VGG16" or "AlexNet" or "VGG19"
```

requirements.txt
- It holds all the dependencies required by my scripts and CNTK libraries to work. It can be used with pip install command to quickly install all the required dependencies (more here)
```
matplotlib==1.5.3
numpy==1.13.3
cntk==2.1
easydict==1.6
Pillow==4.3.0
utils==0.9.0
PyYAML==3.12
```

install_data_and_model.py

This script does 3 things:

Downloads pretrained model specified in config.py which will be later used for transfer learning:

#downloads pretrained model pointed out in config.py that will be used for transfer learning
sys.path.append(os.path.join(base_folder, "..", "..",  "PretrainedModels"))
from models_util import download_model_by_name
download_model_by_name(cfg["CNTK"].BASE_MODEL)

Downloads and unzips our sample HotailorPOC2 dataset:

#downloads hotel pictures classificator dataset (HotailorPOC2)
#comment out lines bellow if you're using a custom dataset
sys.path.append(os.path.join(base_folder, "..", "..",  "DataSets", "HotailorPOC2"))
from download_HotailorPOC2_dataset import download_dataset
download_dataset()

Creates mappings and metadata for dataset:

#generates metadata for dataset required by FasterRCNN.py script
print("Creating mapping files for data set..")
create_mappings(base_folder)

FasterRCNN.py
- We use this script for training and testing the model. It makes use of specific variables in config.py. This script comes unmodified from original CNTK repository on GitHub (version 2.1)

5. Use with custom dataset

[back to the top]

Although this project was prepared specifically for Hotailors case, it's based on one of the standard examples from original CNTK repository on GitHub and thus it can be easily reused in any other scenario. You just need to follow steps bellow:

5.1. Setup

[back to the top]

Follow steps number 1 and 2 from setup instructions.

5.2. Prepare data

[back to the top]

Gather data for your dataset
- Think what type of objects you would like to classify and prepare some images with those objects. The more the better but usually u should get some decent results even with 30-40+ samples per object. Remember that single image can have multiple objects (it was exactly like that in our case)
- Make sure to use only good quality images in specific resolution
- Resolution we used for our project was 1000x1000 px but you can easily lower it depending on your scenario and needs. Just make sure to scale your images to this one specific resolution you will be working with. In our case the original images where much larger then 1000x1000 px but we scalled it down to match the longer side of image to 1000 px
- It's not recommended to go beyond 1000x1000 px

Create a dataset

Create a new folder in Datasets directory and name it with whatever your datasets name is and inside that newly created folder create 3 another folders for your images:
- negative
  
  Here you must add images which don't include any of the objects you will be looking for. The more the better but don't get crazy here, 10 to 20 images should more then enought. Those images will be used during training to show our model what is not interesting for us and should be treated as a background
- positive
  
  Here you must add images that will be used to teach our model what kind of objects it should look for. The more the better but we should be able to see some results with 30-40+ images per class/object we would like to detect. Just bear in mind that one image can have more then one object/class.
- testImages
  
  Those images will be used for testing of your trained model and to evaluate AP (Average Precission) percentage for each class. Just take 20-30 percent of images from positive folder and put them here. It's very important though to not duplicate any images between positive and testImages folders as it may corrupt the results

5.3. Tag images

[back to the top]

In order to make your custom dataset ready to be used for training you will need to create some metadata with coordinates of objects and their names (classes)

Currently the best tool for tagging images is Visual object Taging Tool but for this project I used simple Python scripts that can be found in the original CNTK 2.1 github repository (mine were fine tuned a bit):

C1_DrawBboxesOnImages.py - allows you to draw bounding boxes for all the objects which are interesting to you (present objects you wish to recognize).

There is one variable you will need to change before running this script:
```
#change it to your images directory. Run this script separately for each folder
imgDir = "../../DataSets/HotailorPOC2/testImages"
```
Important thing to mention here is to run this script only for positive and testImages. You don't need to do it for negative because there's actually nothing to tag there.

After successfully running the script you should see something like that:

Now just use your mouse to draw bounding boxes for every object. Some keyboard shortcuts should be helpful here:

"u" - will erase last bounding box you draw

"n" - will move you to next image in current folder

"s" - will skip current image and delete all the bounding boxes for that image

C2_AssignLabelsToBboxes.py - allows to review every bounding box you've marked with C1 script and label it with proper class name.

Before running this script change those 2 variables:

#change it to your images directory. Run this script separately for each folder
imgDir = "../../DataSets/HotailorPOC2/testImages"

#change it to your classes names
classes = ["curtain", "pillow", "bed", "lamp", "toilet", "sink", "tap", "towel"]

Again, same as in C1, run this script only for positive and testImages.

C3_VisualizeBboxes.py - I made this script based on C2 just to visualize bounding boxes for each image in dataset. It's very helpful when you are looking for mistakes within your dataset.

Be sure to change imgDir variable to your directory:
```
#change it to your images directory. Run this script separately for each folder
imgDir = "../../DataSets/HotailorPOC2/testImages"
```
Running C3 script will visualize bounding boxes for every image in directory and you should be able to see if everything is marked correctly:

5.4. Download pretrained model and create mappings for custom dataset

[back to the top]

In order to train the model we use transfer learning and we need to have a pretrained model for that. For this sample we use AlexNet model.

To download the model and create class and files mappings you can use install_data_and_model.py script and simply follow these steps:

Make sure to change variables in your config.py file and make sure you set __C.CNTK.MAP_FILE_PATH variable to a proper directory:

if __C.CNTK.DATASET == "HotailorPOC2": #name of your dataset. Must match the name set with property '__C.CNTK.DATASET'
    __C.CNTK.MAP_FILE_PATH = "../../DataSets/HotailorPOC2" # your dataset directory
    __C.CNTK.NUM_TRAIN_IMAGES = 82 # number of images in 'positive' folder
    __C.CNTK.NUM_TEST_IMAGES = 20 # number of images in 'testImages' folder
    __C.CNTK.PROPOSAL_LAYER_PARAMS = "'feat_stride': 16\n'scales':\n - 4 \n - 8 \n - 12"

Open install_data_and_model.py script and comment out those lines:

#downloads hotel pictures classificator dataset (HotailorPOC2)
#comment out lines bellow if you're using a custom dataset
sys.path.append(os.path.join(base_folder, "..", "..",  "DataSets", "HotailorPOC2"))
from download_HotailorPOC2_dataset import download_dataset
download_dataset()

Run install_data_and_model.py script. Bear in mind that downloading the pretrained model may take few minutes or even more depending on your internet connection.

At this point your custom dataset should be ready for training.

5.5. Run training

[back to the top]

Change variables

Edit config.py script and change following variables:

Change value of __C.CNTK.DATASET:

# set it to your custom dataset name
__C.CNTK.DATASET = "HotailorPOC2"

Change values of __C.CNTK.IMAGE_WIDTH and __C.CNTK.IMAGE_HEIGHT to much your custom dataset images resolution:

# set it to your custom datasets images resolution
__C.CNTK.IMAGE_WIDTH = 1000
__C.CNTK.IMAGE_HEIGHT = 1000

Change values in following code to match your dataset name, your datasets directory location and to match your custom dataset images resolution:

if __C.CNTK.DATASET == "HotailorPOC2": #name of your dataset. Must match the name set with property '__C.CNTK.DATASET'
    __C.CNTK.MAP_FILE_PATH = "../../DataSets/HotailorPOC2" # your dataset directory
    __C.CNTK.NUM_TRAIN_IMAGES = 82 # number of images in 'positive' folder
    __C.CNTK.NUM_TEST_IMAGES = 20 # number of images in 'testImages' folder

Train and test your model with FasterRCNN.py script

Run FasterRCNN.py script and wait till the training and testing finishes.

Training may take even couple hours depending on your hardware setup. It's is best to use high performing GPU's for that kind of purposes.

TIP: If you don't own any machine with heavy GPU you can use one of the ready to go Data Science Virtual Machine images in Azure.

If you won't be satisfied with training results then try fine tunning the variables and cleaning your dataset if necessary and then rerun the training.

5.6. Deploy your model

[back to the top]

When you will find yourself satisfied with your model and you would like to get to know how to use it with RESTful Python web service and deploy it to Azure Web Apps, then check out this repository.