Binary classification workflow

The code in this repository trains a binary classification model on a generated data set and then runs standard classification metrics, SHAP, and supervised clustering to evaluate and interpret the results.

As the data set is synthetically generated, the purpose here is not to glean any particular insights about the data set or even about the utility of the modeling approach. The purpose is to store code that represents a fairly typical modeling workflow that I might use and that can be readily adapted to a particular problem.

Generally speaking, the steps of the workflow are:

Generate multivariate Gaussian-distributed data with correlations among the variables
Dichotomize the response variable based on several different thresholds
For each version of the response variable:
- Split the data into training and test sets
- Choose a modeling algorithm (currently histogram-based gradient-boosted trees)
- Define a hyperparameter space for the modeling algorithm
- Use random search and cross-validation to optimize the hyperparamters on the training data set
- Refit the modeling algorithm with the optimized hyperparameters to the entire training data set
- Choose the classification threshold that maximizes the chosen performance metric for the training and test data sets
- Calculate and plot standard binary classification metrics (e.g., confusion matrix, ROC curve)
- Calculate and plot feature importances based on SHAP
- Calculate and plot supervised clusters

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
tests		tests
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Binary classification workflow

About

Releases

Packages

Languages

afairless/binary_classification_shap

Folders and files

Latest commit

History

Repository files navigation

Binary classification workflow

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages