Skip to content

SuperLearner guide: fitting models, ensembling, prediction, hyperparameters, parallelization, timing, feature selection, etc.

Notifications You must be signed in to change notification settings

ck37/superlearner-guide

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SuperLearner Guide

A guide to using SuperLearner for prediction. This is now included as a vignette in the SuperLearner package.

Note: this tutorial is a bit out of date; some supplemental methods are now in my ck37r package.

  • Installing
  • Background
  • Create dataset
  • Review available models
  • Fit single models
  • Fit ensemble
  • Predict on new dataset
  • Customize a model setting
  • External cross-validation
  • Test multiple hyperparameter settings
  • Parallelize across CPUs
  • Distribution of ensemble weights
  • Feature selection (screening)
  • Optimize for AUC
  • XGBoost hyperparameter exploration

Intermediate

(To be created)

  • create.Learner() custom environments
  • SL.caret wrapper
  • Custom learner wrapper
  • Custom screener
  • Library analysis - cumulative
  • Library analysis - individual algorithms
  • Recombine SuperLearner

Advanced

(To be created)

  • Parallelize across computers (SLURM)
  • Repeated cross-validation
  • Data-adaptive V-selection for cross-validation
  • Multi-level meta-learning

Resources

Books:

Campus Groups:

Courses at Berkeley:

  • Stat 154 - Statistical Learning
  • CS 189 / CS 289A - Machine Learning
  • PH 252D - Causal Inference
  • PH 295 - Big Data
  • PH 295 - Targeted Learning for Biomedical Big Data
  • INFO - TBD

Also many Coursera offerings and other online classes.

References

Erin LeDell, Maya L. Petersen & Mark J. van der Laan, "Computationally Efficient Confidence Intervals for Cross-validated Area Under the ROC Curve Estimates." (Electronic Journal of Statistics)

Polley EC, van der Laan MJ (2010) Super Learner in Prediction. U.C. Berkeley Division of Biostatistics Working Paper Series. Paper 226. http://biostats.bepress.com/ucbbiostat/paper266/

van der Laan, M. J., Polley, E. C., & Hubbard, A. E. (2007). Super learner. Statistical applications in genetics and molecular biology, 6(1).

van der Laan, M. J., & Rose, S. (2011). Targeted learning: causal inference for observational and experimental data. Springer Science & Business Media.

About

SuperLearner guide: fitting models, ensembling, prediction, hyperparameters, parallelization, timing, feature selection, etc.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages