Credit_Risk_Analysis

Tools Used

VSCode 1.78
Python
- pandas
- scikitlearn
- numpy
- imblearn

Overview

To predict the risk of loan defaults using machine learning techniques. The data provided includes results from five different sampling techniques, The performance of the models is measured using balanced accuracy score, precision, and recall.

The purpose of this analysis is to assess different machine learning models for credit risk prediction. By evaluating the performance of various models, we can determine their effectiveness in identifying high-risk loans and low-risk loans. This analysis will provide insights into the strengths and weaknesses of the four different sampling models; Random Oversampling, Cluster Centroid Undersampling, SMOTE Oversampling, SMOTEENN Combination Sampling, and two classifier models; Balanced Random Forest Classifier, and Easy Ensemble Classifier.

We will look at metrics including, the balanced accuracy score, precision, and recall. These metrics will allow us to make an informed decision on which model will best perform a credit risk analysis.

Results

	Random Over Sampler	SMOTE Oversampling	Cluster Centroids	SMOTEENN Sampling	Balanced Random Forest	Easy Ensemble Classifier
Accuracy Score	0.6640	0.6556	0.5455	0.6424	0.7885	0.9317
Confusion Matrix (True/False)	[[72, 29], [6582, 10522]]	[[64, 37], [5514, 11590]]	[[67, 34], [9791, 7313]]	[[71, 30], [7154, 9950]]	[[71, 30], [2153, 14951]]	[[93, 8], [983, 16121]]
Precision	[0.99]- average [0.01]- high risk [1.00]- low risk	[0.99]- average [0.01]- high risk [1.00]- low risk	[0.99]-average [0.01]- high risk [1.00]- low risk	[0.99]- average [0.01]- high risk [1.00]- low risk	[0.99]- average [0.03]- high risk [1.00]- low risk	[0.99]- average [0.09]- high risk [1.00]- low risk
Recall	[0.62]- average [0.71]- high risk [0.62]- low risk	[0.68]- average [0.63]- high risk [0.68]- low risk	[0.42]- average [0.66]- high risk [0.43]- low risk	[0.58]- average [0.70]- high risk [0.58]- low risk	[0.87]- average [0.70]- high risk [0.87]- low risk	[0.94]- average [0.92]- high risk [0.94]- low risk
F1 Score	[0.76]- average [0.02]- high risk [0.76]- low risk	[0.80]- average [0.02]- high risk [0.81]- low risk	[0.59]- average [0.01]- high risk [0.60]- low risk	[0.73]- average [0.02]- high risk [0.73]- low risk	[0.93]- average [0.06]- high risk [0.93]- low risk	[0.97]- average [0.16]- high risk [0.97]- low risk

Summary

There is a wide variety of performance between each model. The Easy Ensemble Classifier and the Balanced Random Forest Classifier are the best-performing models, with the balanced accuracy scores, .9317 and .7885 respectively. Both have F1 scores above .9 indicating that the models are able to predict both positive and negative outcomes more accurately.

The other models, such as Random Oversampling, Cluster Centroid Undersampling, SMOTE Oversampling, and SMOTEENN Combination Sampling, have lower balanced accuracy, precision, and recall scores.

Overall the Easy Ensemble Classifier is best able to predict credit risk. Its ability to combine multple weak learners together allowed it to consistently outperform the other models in terms of accuracy, precision, recall, and F1 score. Its robust performance, balanced predictions, and established effectiveness make it the best recommendation for our credit risk assessment tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Original Project files		Original Project files
Resources		Resources
.gitignore		.gitignore
Credit_Risk_Model_analysis.ipynb		Credit_Risk_Model_analysis.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit_Risk_Analysis

Tools Used

Overview

Results

Summary

About

Releases

Packages

Languages

LJD0/Credit_Risk_Analysis

Folders and files

Latest commit

History

Repository files navigation

Credit_Risk_Analysis

Tools Used

Overview

Results

Summary

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages