This repository provides implementation for attribution metrics: ROAR and LeRF/MoRF metrics which was used in our work [1].
Requirements: Repository uses Taskfile(optional) and docker. You also need nvidia driver setup to use gpu for model training.
Install taskfile from here.
git clone https://github.com/saurabheights/ROAR.git
cd ROAR
task build-docker
# Ssh to docker
task bash-docker
Remove and Retrain [2], ROAR for short, is a benchmark for evaluating interpretability methods in deep neural networks.
ROAR leverages the fact that if an attribution method ranks pixels, then removing most important pixels from the dataset will lead to worse training accuracy. Thus the better the attribution method, the worse will be model trained on this altered dataset.
ROAR consists of 3 steps:-
- Train a classification model on original dataset.
- Use the trained model to extract attribution maps from original dataset.
- Retrain the classification model on altered dataset, where this dataset is created by removing top K% most important pixels from original dataset.
Note - The original author provided the source code for their implementation here. Their implementation utilizes Tensorflow, TPU and many simplistic attribution methods which can be computed during training. This causes few issues:-
- TPU is not available to all.
- Computing attribution methods during training can cause a very heavy bottleneck if attribution method takes too long to compute. A prime example for this is Integrad which requires 50-100 forward passes per image.
Our repository uses Pytorch and handles the bottleneck problem by dumping an attribution dataset as an intermediary step. The original dataset and attribution maps are concurrently loaded from disk during the retrain step.
# Train model (by default uses Resnet8 with CIFAR10)
task train-cls
# Extract attribution maps to remove most important pixels later.
task extract-attribution-maps
# Retrain Resnet8 with altered CIFAR10.
task evaluate-attribution-maps-roar
LeRF/MoRF(Least/Most Relevant First) proposed in [3] makes an assumption that if least/most important pixels in the image are removed, they should cause least/most affect on class scores. Thus the better the attribution method, the change should be as little as possible when using LeRF and more when using MoRF.
task lerf # By default, uses ImageNet and Resnet50 with pretrained model provided by torchvision.
By default, pipeline for both ROAR and LeRF/MoRF supports multiple attribution methods. Add your methods to config file as:-
extract_cams: # or pixel_perturbation_analysis for LeRF/MoRF
attribution_methods: &ATTRIBUTION_METHODS
- name: gradcam
method: src.attribution_loader.compute_gradcam
kwargs:
saliency_layer: 'layer3.0.relu'
- name: new_method
method: method_to_call
kwargs:
any_custom_args_to_pass: new_arg_values
Each attribution method should accept following parameters model, preprocessed_image, label
and
return an attribution map of same size as Image, representing importance of each input pixel. To pass
any extra information, use extract_cams.attribution_methods[index].kwargs
in yaml configuration file.
You can check src.attribution_methods.attribution_loader.generate_attribution
to see how each attribution method is called.
For any questions, bugs or feature enhancement, simply make an issue.
If you use this work, please do cite following references.
[1] Khakzar, Ashkan, et al. "Neural Response Interpretation through the Lens of Critical Pathways" CVPR 2021.
[2] Hooker, Sara, et al. "A Benchmark for Interpretability Methods in Deep Neural Networks." NeurIPS 2019.
[3] Wojciech, Samek et al. "Evaluating the visualization of what a deep neural network has learned." IEEE Transactions on Neural Networks and Learning Systems, 2017.