This repository is all about papers and tools of Explainable AI
Towards better analysis of machine learning models: A visual analytics perspective.Liu, S., Wang, X., Liu, M., & Zhu, J. (2017). Visual Informatics, 1, 48-56.
Visual Interpretability for Deep Learning: a Survey Quanshi Zhang, Song-Chun Zhu (2018) CVPR
interpretable/disentangled middle-layer representations
Towards a rigorous science of interpretable machine learning. F. Doshi-Velez and B. Kim. (2018).
Trends and trajectories for explainable, accountable and intelligible systems: An HCI research agenda. A. Abdul, J. Vermeulen, D. Wang, B. Y. Lim, and M. Kankanhalli,in Proc. SIGCHI Conf. Hum. FactorsComput. Syst. (CHI), 2018, p. 582
most focus on HCI research
A survey of methods for explaining black box models. R. Guidotti, A. Monreale, F. Turini, D. Pedreschi, and F. Giannotti.(2018).
presented a detailed taxonomy of explainability methods according to the type of problem faced.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI) A. Adadi and M. Berrada,in IEEE Access, vol. 6, pp. 52138-52160, 2018.
Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez.arxiv.(2019)
How convolutional neural network see the world - A survey of convolutional neural network visualization methods. Qin, Z., Yu, F., Liu, C., & Chen, X. (2018). ArXiv, abs/1804.11191.
explAIner: A Visual Analytics Framework for Interactive and Explainable Machine Learning, Spinner, T., Schlegel, U., Schäfer, H., & El-Assady, M. (2019). IEEE VAST, Transactions on Visualization and Computer Graphics, 26, 1064-1074.
CSI: collaborative semantic inference
Visual Interaction with Deep Learning Models through Collaborative Semantic Inference. Gehrmann, S., Strobelt, H., Krüger, R., Pfister, H., & Rush, A.M. (2019). IEEE VAST. Transactions on Visualization and Computer Graphics, 26, 884-894.
User can both understand and control parts of the model reasoning process. eg. in text summarization system, user can collaborative writing a summary with machines suggestion.
Manifold: A Model-Agnostic Framework for Interpretation and Diagnosis of Machine Learning Models.Zhang, J., Wang, Y., Molino, P., Li, L., & Ebert, D.S. (2019). IEEE VAST, Transactions on Visualization and Computer Graphics, 25, 364-373.
inspection (hypothesis), explanation (reasoning), and refinement (verification)
Activis: Visual exploration of industry-scale deep neural network models. M. Kahng, P. Y. Andrews, A. Kalro, and D. H. P. Chau. IEEE transactions on visualization and computer graphics, 24(1):88–97, 2018 Facebook
compares the activations from different data instances (i.e., examples) to investigate the potential causes of misclassifications.
As long as the model is accurate for the task, and uses a reasonably restricted number of internal components, intrinsic interpretable models are suffcient. Otherwise, use post-hoc methods.
Including natural language explanations, visualizations of learned models , and explanations by example.
Interpretable explanations of black boxes by meaningful perturbation, R. C. Fong, A. Vedaldi, in IEEE International Conference on Computer Vision, 2017, pp. 3429–3437.
Real time image saliency for black box classifiers, P. Dabkowski, Y. Gal, in: Advances in Neural Information Processing Systems, 2017, pp. 6967–6976.
Sensitivity refers to how an ANN output is influenced by its input and/or weight perturbations
Opening black box data mining models using sensitivity analysis, P.Cortez and M.J.Embrechts, in Proc. IEEE Symp.Comput.Intell.Data Mining (CIDM), (2011)
Using sensitivity analysis and visualization techniques to open black box data mining models, P. Cortez and M. J. Embrechts,Inf. Sci. (2013).
Assign importance values for each feature, for a given prediction based on the game theoretic concept of Shapley values
A unified approach to interpreting model predictions, S.M. Lundberg and S.I. Lee, in Proc. Adv. Neural Inf. Process. Syst., 2017.
Auditing black-box models for indirect influence, P. Adler, C. Falk, S. A. Friedler, T. Nix, G. Rybeck, C. Scheidegger, B. Smith, S. Venkatasubramanian, Knowledge and Information Systems (2018)
- ICE: Individual Conditional Expectation(extends PDP)
Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation, A. Goldstein, A. Kapelner, J. Bleich, E. Pitkin, Journal of Computational and Graphical Statistics 24 (1) (2015) 44–65.
ICE plots extend PDP, reveal interactions and individual differences by disaggregating the PDP output.
- PI & ICI
Visualizing the feature importance for black box models, G. Casalicchio, C. Molnar, B. Bischl, Joint European Conference on Machine Learning and Knowledge Discovery in Databases,Springer, 2018, pp. 655–670
Interpretability via model extraction. O. Bastani, C. Kim, and H. Bastani. (2017).
TreeView: Peeking into deep neural networks via feature-space partitioning. J. J. Thiagarajan, B. Kailkhura, P. Sattigeri, and K. N. Ramamurthy.(2016)
Visualizing the Loss Landscape of Neural Nets. NeurIPS.Li, H., Xu, Z., Taylor, G., & Goldstein, T. (2017).
- Interpretable Mimic Learning
Distilling knowledge from deep networks with applications to healthcare domain. Z. Che, S. Purushotham, R. Khemani, and Y. Liu. (2015).
- DarkSight
Interpreting deep classifier by visual distillation of dark knowledge. K. Xu, D. H. Park, D. H. Yi, and C. Sutton. (2018).
- DeepVID
DeepVID: Deep Visual Interpretation and Diagnosis for Image Classifiers via Knowledge Distillation. Wang, J., Gou, L., Zhang, W., Yang, H.T., & Shen, H. (2019). IEEE Transactions on Visualization and Computer Graphics, 25, 2168-2180.
Adversarial examples: Attacks and defenses for deep learning. X. Yuan, P. He, Q. Zhu, and X. Li. (2017).
Analyzing the Noise Robustness of Deep Neural Networks. M. Liu, S. Liu, H. Su, K. Cao, and J. Zhu. In Proceedings of the IEEE Conference on Visual Analytics Science and Technology, 2018.
visualizes data-paths along the hidden layers in order to interpret the prediction process of adversarial examples.
Explaining Vulnerabilities to Adversarial Machine Learning through Visual Analytics. Ma, Y., Xie, T., Li, J., & Maciejewski, R. (2019). IEEE Transactions on Visualization and Computer Graphics, 26, 1075-1085.
Influence functions
Understanding black-box predictions via influence functions, P. W. Koh, P. Liang, in: Proceedings of the 34th International Conference on Machine Learning. (2017)
Interacticon based
GoldenEye: A peek into the black box: exploring classifiers by randomization, A. Henelius, K. Puolamaki, H. Bostrom, L. Asker, P. Papapetrou, Data mining and knowledge discovery (2014)
Interpreting classifiers through attribute interactions in datasets A. Henelius, K. Puolamaki, A. Ukkonen, (2017).arXiv:1707.07576.
Iterative orthogonal feature projection for diagnosing bias in black-box models J. Adebayo, L. Kagal, (2016). arXiv:1611.04967.
Global is Understanding of the whole logic of a model and follows the entire reasoning leading to all the different possible outcomes.
While local Explaining the reasons for a specific decision or single pre-diction
Why should i trust you?: Explaining the predictions of any classifier M. T. Ribeiro, S. Singh, and C. Guestrin,in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2016,
Approximates a DNN’s predictions using sparse linear models where we can easily identify important features. Extracts image regions that are highly sensitive to the network output.
LRP: Layer-Wise Relevance Propagation
On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, and W. Samek, PLoS ONE, 2015.
Distribution-free predictive inference for regression J. Lei, M. G’Sell, A. Rinaldo, R. J. Tibshirani, and L.Wasserman.
Anchors: High-precision model-agnostic explanations M. T. Ribeiro, S. Singh, and C. Guestrin, in Proc. AAAI Conf. Artif. Intell., 2018.
Visual diagnosis of tree boosting methods. S. Liu, J. Xiao, J. Liu, X. Wang, J. Wu, and J. Zhu. IEEE Transactions on Visualization and Computer Graphics, 24(1):163–173, 2017.
random forest
iForest: Interpreting Random Forests via Visual Analytics Xun Zhao, Yanhong Wu, Dik Lun Lee, and Weiwei Cui. IEEE VIS 2018
Summarize the decision paths in random forests.
(1) Max Activation
Synthesize input pattern that can cause maximal activation of a neuron
Saliency Maps(2013)
Deep inside convolutional networks: visualising image classification models and saliency maps. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. In arXiv:1312.6034, 2013.
Saliency maps are usually rendered as a heatmap, where hotness corresponds to regions that have a big impact on the model’s final decision based on gradient.
CAM: Class Activation Map(2016)
The CAM highlights the class-specific discriminative regions.
Learning Deep Features for Discriminative Localization. Zhou, B., Khosla, A., Lapedriza, À., Oliva, A., & Torralba, A.2016 IEEE (CVPR), 2921-2929.
Grad-CAM: Why did you say that? R. R. Selvaraju, A. Das, R. Vedantam, M. Cogswell, D. Parikh, D. Batra,(2016).
Filter Activation
Convergent learning: Do different neural networks learn the same representations?, Y. Li, J. Yosinski, J. Clune, H. Lipson, J. E. Hopcroft, in: ICLR, 2016.
computing the correlation between activations of different filters.
(2) Deconvolution(2010)
Finds the selective patterns from the input image that activate a specific neuron in the convolutional layers by projecting the lowdimension neurons'feature maps back to the image dimension
First propose Deconv
Deconvolutional networks. M. D. Zeiler, D. Krishnan, G. W. Taylor, R. Fergus, in: CVPR, Vol. 10,2010, p. 7.
Use Deconv to visualize CNN
Visualizing and understanding convolutional net-works. Matthew D. Zeiler and Rob Fer-gus. In ECCV, 2014.
(3) Inversion
Different from the above, which visualize the CNN from a single neuron’s activation, this methods is from Layer-level.
Reconstructs an input image based from a specific layer's feature maps, which reveals what image information is preserved in that layer
Optimisation method
Understanding deep image representations by inverting them. Aravindh Mahendran and Andrea Vedaldi. In CVPR, 2015.
Visualizing deep convolutional neural networks using natural pre-images, A. Mahendran and A. Vedaldi, International Journal of Computer Vision, 120 (2016), 233–255.
Up-conv net based
Inverting visual representations with convolutional networks. Alexey Dosovitskiy and Thomas Brox. In CVPR, 2016.
Train convolutional networks to reconstruct input images from different feature representations
Object detectors emerge in deep scene cnns. Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. In ICRL, 2015.
Reconstruct the minimal image representation that can be classified as the same category as the original image.
Generative model
Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. Nguyen, A., Dosovitskiy, A., Yosinski, J., Brox, T., Clune, J.In: Advances in Neural Information Processing Systems, pp. 3387–3395 (2016)
Plug & play generative networks: Conditional iterative generation of images in latent space. Anh Nguyen, Jeff Clune, Yoshua Ben-gio, Alexey Dosovitskiy, and Jason Yosinski. CVPR, 2017.
(4) Viusalization System: Understanding, Diagnosis, Refinement
Towards better analysis of deep convolutional neural networks, M. Liu, J. Shi, Z. Li, C. Li, J. Zhu, S. Liu, IEEE transactions on visualization and computer graphics 23 (1) (2016) 91–100.
An Interactive Node-Link Visualization of Convolutional Neural Networks. Harley A.W. (2015) Advances in Visual Computing. ISVC
showing not only what it has learned, but how it behaves given new user-provided input.
Do convolutional neural networks learn class hierarchy? A. Bilal, A. Jourabloo, M. Ye, X. Liu, and L. Ren. IEEE transactions on visualization and computer graphics, 24(1):152–162, 2018.
Including a class hierarchy and confusion matrix showing misclassified samples only, bands indicate the selected classes in both dimensions and a sample viewer
A. DeepEyes: Progressive Visual Analytics for Designing Deep Neural Networks. Pezzotti, N., Höllt, T., van Gemert, J., Lelieveldt, B. P., Eisemann, E., & Vilanova, VAST 2017.
Identification of stable layers, degenerated filters,patterns undetected, oversized layers, unnecessary layers or the need of additional layers
Toolbox for visualization CNN
Understanding Neural Networks Through Deep Visualization. Yosinski, J., Clune, J., Nguyen, A.M., Fuchs, T.J., & Lipson, H. (2015). ArXiv, abs/1506.06579.
Visualizing the Hidden Activity of Artificial Neural Networks.Rauber, P.E., Fadel, S.G., Falcão, A.X., & Telea, A. (2017). IEEE Transactions on Visualization and Computer Graphics, 23, 101-110.
Using dimensionality reduction for:
1.visualizing the relationships between learned representations of observations
- visualizing the relationships between artificial neurons.
Picasso: A Modular Framework for Visualizing the Learning Process of Neural Network Image Classifiers. Henderson, R. & Rothe, R., (2017). Journal of Open Research Software. 5(1), p.22.
compute actual receptive field of filters.
Decision Tree
Interpreting CNNs via decision trees , Q. Zhang, Y. Yang, H. Ma, Y. N. Wu, in: IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 6261–6270.
All-conv net
Striving for simplicity: the all convolutional net. ost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Ried-miller. ICLR workshop, 2015.
Objext Detection:replace maxpooling layer with all conv-layers
Network In Network. Lin, M., Chen, Q., & Yan, S. (2013). CoRR, abs/1312.4400.
Long-term recurrent convolutional networks for visual recognition and description. Donahue, J., Hendricks, L.A., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., & Darrell, T. (2014). IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 677-691.
Visualizing and understanding recurrent networks A. Karpathy, J. Johnson, L. Fei-Fei, (2015).arXiv:1506.02078.
Feature Relevence
Explaining recurrent neural network predictions in sentiment analysis L. Arras, G. Montavon, K.-R. Muller, W. Samek, (2017). arXiv:1706.07206.
Understanding hidden memories of recurrent neural networks. Y. Ming, S. Cao, R. Zhang, Z. Li, Y. Chen, Y. Song, and H. Qu. In Visual Analytics Science and Technology (VAST), 2017 IEEE Conference on.IEEE, 2017.
LISA: Explaining Recurrent Neural Network Judgments via Layer-wIse Semantic Accumulation and Example to Pattern Transformation. Gupta, P., & Schütze, H. (2018). BlackboxNLP@EMNLP.
Lstmvis: A tool for visual analysis of hidden state dynamics in recurrent neural networks. H. Strobelt, S. Gehrmann, H. Pfister, and A. M. Rush. IEEE transactions on visualization and computer graphics, 24(1):667–676,2018.
Seq2seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models. Strobelt, H., Gehrmann, S., Behrisch, M., Perer, A., Pfister, H., & Rush, A.M. (2018). IEEE Transactions on Visualization and Computer Graphics, 25, 353-363.
Visualizing Attention in Transformer-Based Language models.Vig, J. (2019).
Deep Features Analysis with Attention Networks. Xie, S., Chen, D., Zhang, R., & Xue, H. (2019). ArXiv, abs/1901.10042.
SANVis: Visual Analytics for Understanding Self-Attention Networks. Park, C., Na, I., Jo, Y., Shin, S., Yoo, J., Kwon, B.C., Zhao, J., Noh, H., Lee, Y., & Choo, J. (2019). ArXiv, abs/1909.09595.
Visualizing and Measuring the Geometry of BERT. Coenen, A., Reif, E., Yuan, A., Kim, B., Pearce, A., Viégas, F.B., & Wattenberg, M. (2019). NeurlIPS, abs/1906.02715.
GANViz: A Visual Analytics Approach to Understand the Adversarial Game. Wang, J., Gou, L., Yang, H.T., & Shen, H. (2018). IEEE Transactions on Visualization and Computer Graphics, 24, 1905-1917.
Analyzing the training processes of deep generative models. M. Liu, J. Shi, K. Cao, J. Zhu, and S. Liu. IEEE transactions on visualization and computer graphics, 24(1):77–87, 2018.
GAN Lab: Understanding Complex Deep Generative Models using Interactive Visual Experimentation. Kahng, M., Thorat, N., Chau, D.H., Viégas, F.B., & Wattenberg, M. (2018). IEEE Transactions on Visualization and Computer Graphics, 25, 310-320.
Graying the black box: Understanding DQNs. Zahavy, T., Baram, N., and Mannor, S. International Conference on Machine Learning, pp. 1899–1908, 2016.
using SAMDPs to analyze high-level policy behavior
Saliency maps
Visualizing and Understanding Atari Agents. Greydanus, S., Koul, A., Dodge, J., & Fern, A. (2017). ICML, abs/1711.00138.
how inputs influence individual decisions using saliency maps
Establishing appropriate trust via critical states. S. H. Huang, K. Bhatia, P. Abbeel, and A. D. Dragan.International Conference on Intelligent Robots (IROS), 2018
finds critical states of an agent based on the entropy of the output of a policy.
Finding and Visualizing Weaknesses of Deep Reinforcement Learning Agents. Rupprecht, C., Ibrahim, C., & Pal, C.J. (2019). ArXiv, abs/1904.01318.
using activation maximization methods for visualization.
DQNViz: A Visual Analytics Approach to Understand Deep Q-Networks. Wang, J., Gou, L., Shen, H., & Yang, H.T. (2018). IEEE VAST, Transactions on Visualization and Computer Graphics(honorable mention), 25, 288-298.
Extract useful action/reward patterns that help to interpret the model and control the training