SuperLearner Guide

A guide to using SuperLearner for prediction. This is now included as a vignette in the SuperLearner package.

Note: this tutorial is a bit out of date; some supplemental methods are now in my ck37r package.

SuperLearner Intro

Installing
Background
Create dataset
Review available models
Fit single models
Fit ensemble
Predict on new dataset
Customize a model setting
External cross-validation
Test multiple hyperparameter settings
Parallelize across CPUs
Distribution of ensemble weights
Feature selection (screening)
Optimize for AUC
XGBoost hyperparameter exploration

Intermediate

(To be created)

create.Learner() custom environments
SL.caret wrapper
Custom learner wrapper
Custom screener
Library analysis - cumulative
Library analysis - individual algorithms
Recombine SuperLearner

Advanced

(To be created)

Parallelize across computers (SLURM)
Repeated cross-validation
Data-adaptive V-selection for cross-validation
Multi-level meta-learning

Resources

Books:

Intro to Statistical Learning (free pdf) (Amazon page) by Gareth James et al.
Applied Predictive Modeling by Max Kuhn
Elements of Statistical Learning
Many others

Campus Groups:

D-Lab's Machine Learning Working Group
D-Lab's Cloud Computing Working Group
The Hacker Within / Berkeley Institute for Data Science

Courses at Berkeley:

Stat 154 - Statistical Learning
CS 189 / CS 289A - Machine Learning
PH 252D - Causal Inference
PH 295 - Big Data
PH 295 - Targeted Learning for Biomedical Big Data
INFO - TBD

Also many Coursera offerings and other online classes.

References

Erin LeDell, Maya L. Petersen & Mark J. van der Laan, "Computationally Efficient Confidence Intervals for Cross-validated Area Under the ROC Curve Estimates." (Electronic Journal of Statistics)

Polley EC, van der Laan MJ (2010) Super Learner in Prediction. U.C. Berkeley Division of Biostatistics Working Paper Series. Paper 226. http://biostats.bepress.com/ucbbiostat/paper266/

van der Laan, M. J., Polley, E. C., & Hubbard, A. E. (2007). Super learner. Statistical applications in genetics and molecular biology, 6(1).

van der Laan, M. J., & Rose, S. (2011). Targeted learning: causal inference for observational and experimental data. Springer Science & Business Media.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
README.md		README.md
SuperLearner-Intro.Rmd		SuperLearner-Intro.Rmd
SuperLearner-Intro.html		SuperLearner-Intro.html
ensemble-suggestions.Rmd		ensemble-suggestions.Rmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SuperLearner Guide

SuperLearner Intro

Intermediate

Advanced

Resources

References

About

Releases

Packages

Languages

ck37/superlearner-guide

Folders and files

Latest commit

History

Repository files navigation

SuperLearner Guide

SuperLearner Intro

Intermediate

Advanced

Resources

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages