Fraud Detection Model

This project implements a fraud detection model using various machine learning techniques. The primary focus is on identifying fraudulent transactions in credit card data.

Introduction

Fraud detection is a critical task in the financial industry. This project utilizes a dataset of credit card transactions to build a predictive model that can distinguish between legitimate and fraudulent transactions.

Dependencies

The following libraries are required to run the project:

numpy
pandas
matplotlib
scikit-learn

You can install the required libraries using pip:

pip install numpy pandas matplotlib scikit-learn

Dataset

The dataset used in this project is creditcard.csv, which contains transactions made by credit cards in September 2013 by European cardholders.

Overview

The dataset consists of transactions that occurred over two days, with a total of 284,807 transactions, of which 492 are fraudulent. The positive class (frauds) accounts for only 0.172% of all transactions, making the dataset highly unbalanced.

Features

The dataset includes only numerical input variables, which are the result of PCA (Principal Component Analysis) transformation. Unfortunately, due to confidentiality issues, the original features and more background information cannot be provided.

Features V1 to V28 are the principal components obtained through PCA.
The features Time and Amount have not been transformed:
- Time: Seconds elapsed between each transaction and the first transaction in the dataset.
- Amount: The transaction amount, which can be used for example-dependent cost-sensitive learning.
Class: The response variable that indicates whether a transaction is fraudulent (1) or not (0).

Exploratory Data Analysis (EDA)

The initial analysis of the dataset includes:

Checking for missing values
Statistical summary of the features
Visualizations of transaction amounts and class distributions

Model Building

The following machine learning algorithms are implemented:

Logistic Regression
Decision Tree Classifier
Random Forest Classifier

Model training involves splitting the dataset into training and test sets, followed by fitting the models and making predictions.

Use Cases

Fraud detection models can be applied in various scenarios, including:

Credit Card Transactions: Identifying fraudulent charges to protect consumers and banks.
Insurance Claims: Detecting false claims to reduce losses for insurance companies.
E-commerce Transactions: Preventing fraudulent purchases and chargebacks in online retail.
Telecommunications: Identifying fraudulent activities in mobile phone usage, such as SIM card cloning.
Financial Services: Monitoring and detecting unauthorized access or transactions in banking applications.

Evaluation Metrics

The performance of the models is evaluated using:

Accuracy Score
F1 Score
Precision Score
Recall Score
ROC AUC Score

Confusion matrices are also generated to visualize the performance of the models.

Citations

Please cite the following works for further reading and acknowledgment:

Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson, and Gianluca Bontempi. Calibrating Probability with Undersampling for Unbalanced Classification. In Symposium on Computational Intelligence and Data Mining (CIDM), IEEE, 2015.
Dal Pozzolo, Andrea; Caelen, Olivier; Le Borgne, Yann-Ael; Waterschoot, Serge; Bontempi, Gianluca. Learned lessons in credit card fraud detection from a practitioner perspective, Expert Systems with Applications, 41(10), 4915-4928, 2014, Pergamon.
Dal Pozzolo, Andrea; Boracchi, Giacomo; Caelen, Olivier; Alippi, Cesare; Bontempi, Gianluca. Credit card fraud detection: a realistic modeling and a novel learning strategy, IEEE Transactions on Neural Networks and Learning Systems, 29(8), 3784-3797, 2018, IEEE.
Dal Pozzolo, Andrea. Adaptive Machine learning for credit card fraud detection, ULB MLG PhD thesis (supervised by G. Bontempi).
Carcillo, Fabrizio; Dal Pozzolo, Andrea; Le Borgne, Yann-Aël; Caelen, Olivier; Mazzer, Yannis; Bontempi, Gianluca. Scarff: a scalable framework for streaming credit card fraud detection with Spark, Information Fusion, 41, 182-194, 2018, Elsevier.
Carcillo, Fabrizio; Le Borgne, Yann-Aël; Caelen, Olivier; Bontempi, Gianluca. Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization, International Journal of Data Science and Analytics, 5(4), 285-300, 2018, Springer International Publishing.
Bertrand Lebichot, Yann-Aël Le Borgne, Liyun He, Frederic Oblé, Gianluca Bontempi. Deep-Learning Domain Adaptation Techniques for Credit Cards Fraud Detection, INNSBDDL 2019: Recent Advances in Big Data and Deep Learning, pp 78-88, 2019.
Fabrizio Carcillo, Yann-Aël Le Borgne, Olivier Caelen, Frederic Oblé, Gianluca Bontempi. Combining Unsupervised and Supervised Learning in Credit Card Fraud Detection, Information Sciences, 2019.
Yann-Aël Le Borgne, Gianluca Bontempi. Reproducible Machine Learning for Credit Card Fraud Detection - Practical Handbook.
Bertrand Lebichot, Gianmarco Paldino, Wissam Siblini, Liyun He, Frederic Oblé, Gianluca Bontempi. Incremental learning strategies for credit cards fraud detection, International Journal of Data Science and Analytics.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Fraud_Detection_Model.ipynb		Fraud_Detection_Model.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fraud Detection Model

Table of Contents

Introduction

Dependencies

Dataset

Overview

Features

Exploratory Data Analysis (EDA)

Model Building

Use Cases

Evaluation Metrics

Citations

About

Languages

monsterdevgit/Fraud_Detection_Model

Folders and files

Latest commit

History

Repository files navigation

Fraud Detection Model

Table of Contents

Introduction

Dependencies

Dataset

Overview

Features

Exploratory Data Analysis (EDA)

Model Building

Use Cases

Evaluation Metrics

Citations

About

Topics

Resources

Stars

Watchers

Forks

Languages