Skip to content

A machine learning pipeline for classifying images of cats and dogs.

License

Notifications You must be signed in to change notification settings

oislen/CatClassifier

Repository files navigation

Cat Classification

Overview

This git repository contains code and configurations for implementing a Convolutional Neural Network to classify images containing cats or dogs. The data was sourced from the dogs-vs-cats Kaggle competition, and also from freeimages.com using a web scraper. Docker containers were used to deploy the application on an EC2 spot instances in order to scale up hardware and computation power.

Repo Contents

  • The aws subdirectory contains batch and shell scripts for configuring ec2 spot instances and the deploying docker container remotely.
  • The conda subdirectory contains batch and shell scripts for creating a local conda environment for the project.
  • The data_prep subdirectory contains python utility scripts to data cleansing and processing for modelling.
  • The kaggle subdirectory contains python scripts for downloading and unzipping competition data from Kaggle.
  • The model subdirectory contains python scripts for initiating and training CNN models.
  • The ref subdirectory contains previous analysis and kernals on dogs vs cats classification from Kaggle community members.
  • The report subdirectory contains reportable images and plots generated by the application.
  • The webscrapers subdirectory contains webscraping tools for downloading cats and dogs images from freeimages.com.

Application Scripts

The main dog and cat image classification application is contained within the root scripts:

  • The 01_prg_kaggle_data.py script downloads / unzips the cat vs dogs competition data.
  • The 02_prg_scrape_imgs.py script scrapes additional cat and dog images from freeimages.com.
  • The 03_prg_keras_model.py script trains, fits and makes image predictions of the cat and dog images using a CNN model.
  • The analysis_results.ipynb file contains a high level summary aof the analysis results.
  • The cons.py script contains programme constants and configurations.
  • The Dockerfile builds the application container for deployment on ec2.
  • The exeDocker.bat executes the Docker build process locally on windows.
  • The requirements.txt file contains the python package dependencies for the application.

Analysis Results

See the analysis results notebook for a summary of the project; including image processing, CNN architecture and model performance.

Docker Container

The application docker container is available on dockerhub here:

https://hub.docker.com/repository/docker/oislen/cat-classifier