Code to reproduce results from the paper:
Back to the Basics: Revisiting Out-of-Distribution Detection Baselines. ICML 2022 Workshop on Principles of Distribution Shift
Out-of-distribution (OOD) detection is the task of determining whether a datapoint comes from a different distribution than the training dataset. For example, we may train a model to classify the breed of dogs and find that there is a cat image in our dataset. This cat image would be considered out-of-distribution. This work evaluates the effectiveness of various scores to detect OOD datapoints.
This repository is only for intended for scientific purposes. To detect outliers in your own data, you should instead use the implementation from the official cleanlab library.
This repository is broken into two major folders (inside src/experiments/
):
-
OOD/
: primary benchmarking code used for the paper linked above. -
adjusted-OOD-scores/
: additional benchmarking code to produce results from the article:
A Simple Adjustment Improves Out-of-Distribution Detection for Any Classifier. Towards AI, 2022.
This additional code considers OOD detection based solely on classifier predictions and adjusted versions thereof.
For each experiment, we perform the following procedure:
- Train a Neural Network model with ONLY the in-distribution training dataset.
- Use this model to generate predicted probabilties and embeddings for the in-distribution and out-of-distribution test datasets (these are considered out-of-sample predictions).
- Use out-of-sample predictions to generate OOD scores.
- Threshold OOD scores to detect OOD datapoints.
Experiment ID | In-Distribution | Out-of-Distribution |
---|---|---|
0 | cifar-10 | cifar-100 |
1 | cifar-100 | cifar-10 |
2 | mnist | roman-numeral |
3 | roman-numeral | mnist |
4 | mnist | fashion-mnist |
5 | fashion-mnist | mnist |
For our experiments, we use AutoGluon's ImagePredictor for image classification which requires the training, validation, and test datasets to be image files.
Links below to download the training and test datasets in PNG format:
-
cifar-10 and cifar-100: https://github.com/knjcode/cifar2png
-
roman-numeral: https://worksheets.codalab.org/bundles/0x497f5d7096724783aa1eb78b85aa321f
There are duplicate images in the dataset (exact same image with different file names). We use the following script to dedupe:
src/preprocess/remove_dupes.py
-
fashion-mnist: https://github.com/DeepLenin/fashion-mnist_png
- NVIDIA Container Toolkit: allows us to properly utilize our NVIDIA GPUs inside docker environments
- autogluon==0.4.0
Clone this repo and run below commands:
sudo docker-compose build
sudo docker-compose run --rm --service-port dcai
Run command below.
Note that we use a Makefile to run jupyter lab for convenience so we can save args (ip, port, allow-root, etc).
make jupyter-lab
Run notebook below to train all models.
src/experiments/OOD/0_Train_Models.ipynb
Note that we use 2 neural net architectures below with AutoGluon and each use different backends:
- swin_base_patch4_window7_224 (torch backend)
- resnet50_v1 (mxnet backend)
Here is a notebook that runs all experiments: