This repository contains a Jupyter Notebook that serves as an end-to-end example of a data science and machine learning proof of concept for heart disease classification.
The goal is to predict whether a patient has heart disease based on clinical parameters.
The dataset used in this project is sourced from the Cleveland database from UCI Machine Learning Repository but has been downloaded in a formatted way from Kaggle. It contains 14 attributes that will be used for prediction.
The initial evaluation metric is set to achieve 95% accuracy in predicting heart disease during the proof of concept.
The features used for prediction include age, sex, chest pain type, resting blood pressure, serum cholesterol level, fasting blood sugar, resting electrocardiographic results, maximum heart rate achieved, exercise-induced angina, ST depression induced by exercise relative to rest, slope of the peak exercise ST segment, number of major vessels colored by fluoroscopy, and thalium stress result.
The following libraries are utilized for data analysis, visualization, and machine learning tasks:
We conducted hyperparameter tuning using techniques such as RandomizedSearchCV and GridSearchCV to optimize the performance of our models. The process involved adjusting the settings of each algorithm to find the best combination of hyperparameters. Here's a summary of our findings:
- Logistic Regression:
- Test accuracy: 88.52%
- Random Forest:
- Test accuracy: 86.89%
This project is still under development and will be updated regularly. Stay tuned for further updates and improvements.