Click here for Dataset1 Click here for Dataset2.
learning and understanding of Convolutional Neural Networks
- First thing in this pipeline is recognition algorithm for both humans and dogs and then classify it by giving out the exact name of breed.
We can try different pretrained algorithms by OpenCV.
I have tried HOG, LBP, HAAR etc.. or any other deep learning based pre trained models.
- HAAR:
This algorithm is also called voila jones algorithm based on HAAR like wavelets. HAAR wavelets are sequence of rescaled square shaped functions which is explained in detailed way here.
HAAR like features for detection :
A target window of some determined size is moved over the entire input image for all possible locations to calculate HAAR like features and since it was a very high computational task, therefore alternative method using integral images was designed. The way it works is described briefly below :
Integral images calculation reduced the computations :
In below figure, haar works by calculating the difference between sum of black and sum white shades and let's say here it comes out to be something like :
haar feature real images
The closer this difference is to "1", then most probably a haar feature has been detected !
- HOG:
Histogram of oriented gradients is calculated by taking difference in pixel intensities for every block of pixel in a 64 * 64 window, similar to sliding window over the entire image.
This is based on the fact that, certain regions of our face have slightly darker shades over the other and thus there becomes gradient oientation of vector in some localized portions of our face.
Like in this image, we can see the gradient magnitude and gradient direction:
Now calculating for all pixel blocks:
For more detailed explanation, click here.
- LBP:
Local binary patterns is a algorithm for feature detection based on local representation of texture.
How it's calculated ? Let's see...
For every block (in grayscale) , we select a center pixel value and construct a threshold by indicating 1 if value in center is greater than or equal to neighbouring one otherwise zero and then construct a 1 -D array by warping around either in clockwise or anticlockwise direction.
(Here i show for one of the central pixel - "10", but it is done for every other pixel block)
Then, a histogram of 256 bin is constructed from the final output lbp pattern image.
- Here also, we can use above detectors mainly deep learning based like VGG16, ResNet50 etc.
- Now, that we have recognized that if image contains a dog face, a human face or none of them.
It's time for training our own neural network for classifiying the breed of dog if image contains dog (or most resembled label for human !)
So, let's get started.....
We can this here using two different approaches :
* Constructing CNN from scratch
* Using pre trained CNN models
(conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv2): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv3): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(fc1): Linear(in_features=50176, out_features=500, bias=True)
(fc2): Linear(in_features=500, out_features=133, bias=True)
(dropout): Dropout(p=0.5, inplace=False)
Pre trained ResNet50 model :
(Reasons of choosing this model has been included in the notebook itself)
- This architechture contains (conv1) as first convolutional layer containing in channels as 3 which is due to RGB input tensor , (bn1) as batch normalization layer, followed by ReLU and MaxPooling and then it contains 4 main layers named layer1, layer2, layer3 and layer4 which contains further sub layers of convolution followed by batchnorm followed by relu followed by maxpooling , and then finally fc.
- ReLU activation is used as it's the most proven activation function for classification problems as it introduces good and right amount of non linearity with less chances of vanishing gradient problem !
- Batch normalization helped in making the network more stable and learning faster thereby faster convergence.
- Maxpooling helped in downsampling high number of parameters created by producing higher dimensional feature maps after convolution operation and thus selecting only relevant features from the high dimensioned feature matrix.
- Then i replaced last layer of this architechture by fully connected layer containing two sub linear layers as follows :
Linear(in_features=2048, out_features=512) Linear(in_features=512, out_features=133)
with ReLU activations between the linears.
* Used both CrossEntropyLoss() and NLLLoss()
* Used SGD and Adam
- Augmentation used :
transforms.RandomRotation(10),
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip()
- For getting started locally on your own system, click here.
- Check out the complete source code including training and testing codes
- If you just want the raw jupyter notebook, check out report here
- For checking deep inside model parameters and shapes, click here
- Want sample images for testing, download here
- Want pre trained model weights, download here
- Face recognition
- CNN
- Gradient descent
- Backpropogation
- Data augmentation
- Pytorch docs
- Udacity course Deep learning ND
⭐️ this Project if you liked it !