Skip to content

Latest commit

 

History

History
94 lines (60 loc) · 3.76 KB

random-forest.md

File metadata and controls

94 lines (60 loc) · 3.76 KB

Random Forest

Back to Index


Objective

Learn Random Forest algorithms

Prerequisite Reading

Essentials Reading

Random Forest

RF Feature Importance

Extra Reading

Implementing Random Forest in Scikit-Learn

Knowledge Check

  • What problem can RF solve? Classification, regression, both?
  • What are the issues with DT, that are solved by RF?
  • What are the strengths and weaknesses of RF?
  • What are the tuning parameters for RF? Which is the most important tuning param?
  • How do we calculate feature importance from RF?

Exercises

We will be using RF in the same exercises we did in Decision Trees section

Difficulty Level

★☆☆ - Easy
★★☆ - Medium
★★★ - Challenging
★★★★ - Bonus

EX-1: RF Classification - Synthetic data (★☆☆)

Use Scikit's make_blobs or make_classification to generate some sample data.

Try to separate them using RF

EX-2: RF Classification (★★☆)

  • Here is Bank marketing dataset
  • You may want to encode variables
  • Use DT to predict yes/no binary decision
  • Visualize the tree
  • Create a confusion matrix
  • What is the accuracy of the model
  • Run Cross Validation to gauge the accuracy of this model

EX-3: RF Regression - Synthetic data (★☆☆)

Use Scikit's make_regression to generate some sample data.

Use RandomForestRegressor to solve this

EX-4: RF Regression (★★☆)

  • Use Bike sharing data
  • Use RandomForestRegressor to predict bike demand
  • Visualize the tree
  • Use RMSE, R2 to evaluate the model
  • Use Cross Validation to thoroughly test the model performance

More Exercises