[Paper] Summary of Interpretable Machine Learning – A Brief History, State-of-the-Art and Challenges

Paper: Interpretable Machine Learning – A Brief History, State-of-the-Art and Challenges by Christoph Molnar, Giuseppe Casalicchio, Bernd Bischl.

Interesting for the list of papers referenced.

The classification of the methods in this paper somewhat feels weird to me, and I sort of disagree with it (post-reading Ian Covert’s paper: ‘Explaining by Removing: A Unified Framework for Model Explanation’), but that doesn’t matter too much: The field is not very unified yet despite some start of convergence on terms.

Interpretable Machine Learning (IML) methods:

  • analyze model components,
  • study sensitivity to input perturbations,
  • analyze local or global surrogate approximations of the ML model.

Remaining challenges for IML:

  • dealing with dependent features,
  • causal interpretation,
  • uncertainty estimation,
  • missing rigorous definition of interpretability.

1. Introduction

Interpretable machine learning (IML) methods can be used to discover knowledge, to debug or justify the model and its predictions, and to control and improve the model.

2. Brief History of IML

  • model-agnostic explanation methods
  • model-specific explanation methods (for deep neural networks or tree ensembles)

3. Today

  • permutation feature importance
  • Shapley values
  • counterfactual explanations
  • partial dependence plots
  • saliency maps

Open source implementations of various IML methods:

  • iml (R)
  • DALEX (R)
  • Alibi (Python)
  • InterpretML (Python)

4. ML Methods

  • analyzing components of interpretable models (e.g. linear regressions and trees)
  • analyzing components of more complex models (e.g. visualizing feature maps of a CNN)
  • explaining individual predictions (e.g. Shapley values and counterfactual explanations)
  • explaining global model behavior
    • feature importance (ranks features based on how relevant they were for the prediction)
    • feature effect (expresses how a change in a feature changes the predicted outcome, e.g. partial dependence plots, individual conditional expectation curves)
  • using surrogate models (analyzing the components of the interpretable surrogate model, e.g. LIME)

5. Challenges

  • statistical uncertainty and inference: methods such as feature importance or Shapley values provide explanations without quantifying the uncertainty of the explanation; most IML methods for feature importance are not adapted for multiple testing
  • causal interpretation; some early research on causality x (permutation feature importance, Shapley values)
  • feature dependence
  • the very (lack of) definition of interpretability (but various quantifiable aspects of interpretability are emerging, e.g. sparsity, interaction strength, fidelity, sensitivity to perturbations, simulatability; once again, nothing is unified yet) – authors suggest to get inspiration from the field of human-computer interaction
  • easy-to-understand explanations… for now, you need to be a specialist to understand and interpret them approximately correctly; not meant to be understood by an end-user yet.


Molnar, Christoph, Giuseppe Casalicchio, and Bernd Bischl. “Interpretable Machine Learning–A Brief History, State-of-the-Art and Challenges.” arXiv preprint arXiv:2010.09337 (2020).

Books to read to get up to speed with the field: