This project implements a custom YOLO (You Only Look Once)👀 model for detecting faces in images. The model has been built from scratch using PyTorch, and it predicts bounding boxes around faces in images.
-
Model Architecture:
- The YOLO architecture consists of convolutional layers for feature extraction and fully connected layers for predicting bounding boxes.
- The model outputs grid cells (14x14), each predicting 1 bounding box along with confidence score.
-
Loss Function:
- The custom loss function includes multiple components:
- Coordinate Loss (
lambda_coord=5
): Penalizes errors in the predicted center coordinates of the bounding boxes. - Size Loss (
lambda_size=5
): Penalizes discrepancies in the predicted width and height of the bounding boxes. - Object Loss (
lambda_obj=1
): Penalizes differences in confidence scores for boxes that contain objects. - No Object Loss (
lambda_noobj=0.5
): Reduces false positives by penalizing confidence in grid cells without objects.
- Coordinate Loss (
- The custom loss function includes multiple components:
-
Non-Maximum Suppression (NMS):
- After the model generates bounding box predictions, NMS is applied to eliminate redundant boxes.
- NMS filters out overlapping boxes based on Intersection over Union (IoU) and retains only the most confident predictions.
- 00_FaceDetection: Contains the final implementation with all scripts for training, evaluation, and inference.
- previous_attempts: Holds earlier versions of the project, documenting issues with dataset construction and loss function design.
- Vasilije Todorović