Skip to content

GitHub-Insight-ANZ-Lab/copilot-challenge-data-scientist-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

DECISION TREE CLASSIFICATION BASED ON DIABETES DATASET

INTRODUCTION

diabetes.csv is originally from the National Institute of Diabetes and Digestive and Kidney
Diseases. The objective of the dataset is to diagnostically predict whether a patient has diabetes,
based on certain diagnostic measurements included in the dataset. Several constraints were placed
on the selection of these instances from a larger database. In particular, all patients here are females
at least 21 years old of Pima Indian heritage.2
From diabetes.csv you can find several variables, some of them are independent
(several medical predictor variables) and only one target dependent variable (Outcome).

INSTRUCTIONS

1. Use Copilot Chat to create a new notebook in your project. Use command /newnotebook and name it as "Diabetes Tree Classifier".
2. Use Copilot and Copilot Chat to develop the exercise and support your learning.

EXERCISE

The objective of this project is to build a decision tree classifier based on Scikit-learn and
Python. The classifier should be able to predict whether a patient has diabetes or not based on
certain diagnostic measurements included in the dataset.

1. Importing Required libraries to build a decsion tree classifier

2. Loading the dataset

3. Exploratory Data Analysis

3.1. Display the first 5 rows of the dataframe.
3.2. Display the number of rows and columns in the dataframe.
3.3. Display the data types of each column.
3.4. Display the number of missing values in each column.
3.5. Display the number of unique values in each column.

4. Feature Selection

4.1. Split the data into features and target variable.
4.2. Split the data into training and testing sets.

5. Building the Decision Tree Classifier

5.1. Instantiate the DecisionTreeClassifier class.
5.2. Fit the model to the training data.
5.3. Predict the labels of the test data.
5.4. Evaluate the model.

6. Visualizing the Decision Tree

7. Conclusion