This project aims to develop object recognition algorithms to identify digits and numbers in natural scene images using the Street View House Numbers (SVHN) dataset with computer Vision.
SVHN is a real-world image dataset obtained from house numbers in Google Street View images. It contains over 600,000 digit images with minimal preprocessing requirements, making it suitable for training and evaluating object recognition algorithms.
- Classes: There are 10 classes, one for each digit from 0 to 9.Note that digit '0' has label 10.
- Data Split: The dataset consists of 73,257 digits for training, 26,032 digits for testing, and an additional 531,131 samples that can be used as extra training data.
- Formats: SVHN comes in two formats: original images with character-level bounding boxes and MNIST-like 32-by-32 images centered around a single character.
However, we used the MNIST-like 32-by-32 images format
You can download the dataset from Stanford University
The core of this project lies in the application of computer vision methodologies for digit detection and recognition. Due to the nature of our dataset, which lacks a "100%" closed region, traditional methods like Hough Transform are not suitable. Instead, we employ the Alternating Edge Concept for edge detection, which proves to be effective in identifying digit boundaries amidst complex backgrounds.
The detect_alternating_pixels
function implements the alternating edge detection logic. It takes the input image, height, and width as parameters and returns the number of alternations detected in different regions of the image.
The Filters.py
module contains custom filter functions used in the project, including:
- High Pass Filter: Enhances high-frequency components in the image.
- Sobel Edge Detection: Detects edges in the image using the Sobel operator.
- HoughCircles: Used for counting circles in an image. Among these, the high_pass_filter emerged as the most effective filter. Its ability to highlight high-frequency components in the image significantly contributed to the accuracy of our recognition system.
- It reads the image using OpenCV's cv2.imread().
- Converts the image to grayscale using cv2.cvtColor().
- Applies a high-pass filter to enhance image features.
- Detects alternating pixels in the filtered image using detect_alternating_pixels().
- Based on the detected pixel patterns, it attempts to recognize the digit using detectNumber().
- It compares the detected digit with the ground truth label and updates the correctness counter.
Digit Detection Logic:
The detectNumber() function implements the logic for recognizing digits based on detected pixel patterns. It considers various combinations of alternating pixels in horizontal, vertical, and diagonal directions to determine the digit.
Observation Function:
The obs() function is a utility function used to visualize the grayscale image with its corresponding class label.
This project applied classical computer vision techniques, including the Alternating Edge Concept and custom filters, to successfully recognize digits in the SVHN dataset. It demonstrates the effectiveness of traditional image processing methods for digit recognition without relying on AI or machine learning.