Random Forest

Objective

Learn Random Forest algorithms

Prerequisite Reading

Brush up on Decision Trees

Essentials Reading

Random Forest

Understanding Random Forest
Random Forest algorithm - video
How to Develop a Random Forest Ensemble in Python - good in-depth introduction and code examples
Bagging and Random Forest Ensemble Algorithms for Machine Learning

RF Feature Importance

How to Calculate Feature Importance With Python - Focus on RF section
Explaining Feature Importance by example of a Random Forest
Feature Selection Using Random Forest

Extra Reading

Random Forests - some good theory
Section. 8.2 "Bagging, Random Forests, Boosting" in Introduction to Statistical Learning

Implementing Random Forest in Scikit-Learn

Knowledge Check

What problem can RF solve? Classification, regression, both?
What are the issues with DT, that are solved by RF?
What are the strengths and weaknesses of RF?
What are the tuning parameters for RF? Which is the most important tuning param?
How do we calculate feature importance from RF?

Exercises

We will be using RF in the same exercises we did in Decision Trees section

Difficulty Level

★☆☆ - Easy
★★☆ - Medium
★★★ - Challenging
★★★★ - Bonus

EX-1: RF Classification - Synthetic data (★☆☆)

Use Scikit's make_blobs or make_classification to generate some sample data.

Try to separate them using RF

EX-2: RF Classification (★★☆)

Here is Bank marketing dataset
You may want to encode variables
Use DT to predict yes/no binary decision
Visualize the tree
Create a confusion matrix
What is the accuracy of the model
Run Cross Validation to gauge the accuracy of this model

EX-3: RF Regression - Synthetic data (★☆☆)

Use Scikit's make_regression to generate some sample data.

Use RandomForestRegressor to solve this

EX-4: RF Regression (★★☆)

Use Bike sharing data
Use RandomForestRegressor to predict bike demand
Visualize the tree
Use RMSE, R2 to evaluate the model
Use Cross Validation to thoroughly test the model performance

More Exercises

Index