Skip to content

Jayanth-MKV/Phishy

Repository files navigation

Phishing URL Detector Plugin API

image image

Table of Content

Introduction

Phishing URL Detection

Overview

This project is about detecting phishing URLs using machine learning algorithms. The project consists of three main parts: data loading and cleaning, feature extraction, and model training and evaluation. The project uses the Gradient Boosting Classifier to classify phishing URLs with an accuracy of over 96%.

Installation

To run the project, you can follow these steps:

  1. Clone the repository: git clone https://github.com/your-username/Phishing-URL-Detection.git
  2. Install the required packages: pip install -r requirements.txt
  3. Run the Flask application: python app.py

To see project click here.

Directory Tree

├── db
│   ├── load_data.py
│   ├── save_data.py
│   ├── train_model.py
├── pickle
│   ├── model.pkl
├── Phishing URL Detection.ipynb
├── README.md
├── app.py
├── database.db
├── feature.py
├── phishing.csv
├── requirements.txt

Files

  • app.py: Flask web application for testing the model
  • feature.py: script for extracting features from URLs
  • database.db: SQLite database for storing URLs and their labels
  • phishing.csv: dataset containing URLs and their labels
  • pickle/model.pkl: serialized model object
  • joblib/gbc_model.joblib: serialized model object using joblib
  • db/load_data.py: script for loading data into the database
  • db/save_data.py: script for saving data to the database
  • db/train_model.py: script for training and evaluating the model
  • Phishing URL Detection.ipynb: Jupyter notebook containing the project code and documentation
  • README.md: readme file explaining the project

Technologies Used

Model Comparison

ML Model Accuracy F1 Score Recall Precision
Gradient Boosting Classifier 0.974 0.977 0.994 0.986
Multi-layer Perceptron 0.971 0.974 0.990 0.991
XGBoost Classifier 0.969 0.973 0.993 0.984
Random Forest 0.966 0.970 0.994 0.984
Support Vector Machine 0.964 0.968 0.980 0.965
Decision Tree 0.958 0.962 0.991 0.993
K-Nearest Neighbors 0.956 0.961 0.991 0.989
Logistic Regression 0.934 0.941 0.943 0.927
Naive Bayes Classifier 0.914 0.922 0.907 0.922

The table above shows the performance metrics of various machine learning models trained on the phishing URL dataset. The accuracy, F1 score, recall, and precision are reported for each model. The results show that the Gradient Boosting Classifier has the highest accuracy, F1 score, recall, and precision among all models, with an accuracy of 0.974, F1 score of 0.977, recall of 0.994, and precision of 0.986.```

Feature importance for Phishing URL Detection

image

Conclusion

The present research work aimed to explore various machine learning models and perform exploratory data analysis on a phishing dataset to understand the features that affect the models' ability to detect whether a URL is safe or not.

The research project involved the creation of a notebook, which provided a significant learning experience in the domain of phishing detection. The project's findings revealed that certain features, such as "HTTPS," "AnchorURL,""LinkInScriptTags,""PrefixSuffix," and "WebsiteTraffic," were crucial in classifying URLs as phishing URLs or not.

After testing various machine learning models, the Gradient Boosting Classifier emerged as the best-performing model, with an accuracy of 97.4%. This performance indicates a promising reduction in the likelihood of malicious attachments.

Overall, this project showcases the significance of machine learning models in detecting phishing URLs and the importance of feature selection in the model's performance. Future research can extend this project to evaluate more advanced features and models, leading to even more accurate results.

Contributing

If you would like to contribute to the project, you can create a pull request with your changes. Please make sure to follow the project's coding conventions and include tests for any new functionality.

About

Phishing Link Detector Plugin

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published