AI and Explainability – DZone AI

This is an article from DZone’s 2021 Enterprise AI Trend Report.

For more:

Read the Report

Explainable artificial intelligence, sometimes referred to as XAI, is exactly what it sounds like — explaining how and why a machine learning model makes a prediction. While models are usually classified as either “black box” or “glass box,” it isn’t quite as simple as that; There are some that fall somewhere in between. Some models are more naturally transparent than others, and their uses depend on the application.

Naturally transparent models — also called “white box,” “clear box,” or “glass box” models — are those that can be easily interpreted and often even diagrammed out. Their transparency is fundamental to the algorithm’s math, so no additional techniques are needed to interpret the results. As an example, below is a diagram representing a decision tree algorithm predicting if a passenger survived the Titanic. By looking at the figure, we see that the model is picking up on the “women and children first” policy that was enforced during the tragedy.

Diagram representing a decision tree algorithm predicting if a passenger survived the Titanic

Figure 1

Black-box models, on the other hand, make predictions that can’t be so easily interpreted. It is often claimed that there’s no way of knowing how and why the model reached its decision. Of course, how a model reached its decision is just fancy arithmetic — features are input, matrices are multiplied, and an output is returned. However, the models on their own don’t tell you why they made their decision. For that, we need additional tools.

Situation Assessment

The transparency level of a model must match its application, a necessary consideration when designing your experiment. Some applications require complete transparency of a model, usually for legal reasons. The most obvious example of this is models for actuarial purposes. In these circumstances, all machine learning must be completely transparent in order to justify any decisions. A glass-box model is usually a requirement for any model based on data with sensitive personal attributes. While having this complete transparency sounds like a wonderful thing, it’s not always necessary or even desirable. Often, black-box models, such as neural networks, perform significantly better at their tasks. It isn’t important how a language translation was performed, or Google search results were returned, as long as they are the most accurate and pertinent.

There are cases, too, where only some transparency is needed, such as self-driving cars. It is very important to know what a car was thinking when it made a mistake and what might be some of the possible causes. But it isn’t necessary to consider every little detail — the complexity of the input data makes this almost impossible in any case. However, if problems arise, it is important to know that the car didn’t see a stop sign or thought a plastic bag was a pedestrian. This helps target mitigation strategies for future iterations of the model.

Let’s take a look at examples of both types of models.

Naturally Transparent Models

The models often taught to students studying statistics or machine learning are those that are most naturally transparent. In some sense, the underlying transparency of the model is what makes it the easiest concept to grasp.

Linear models, such as linear regression, are perhaps the most transparent models that exist. The model spits out a simple equation, often of the form y = M x + b, where M, x, and b are matrices. The coefficients — the matrix M — are simply the importances of each feature.

Another extremely simple and transparent model is the decision tree. You can think of a decision tree as a game of 20 questions, only you get to pick the number of yes/no questions. This algorithm has the advantage of being mapped out to make the interpretation extremely clear. Additionally, the model will return the “feature importance” for each column so you can see what is impacting the decision the most.

As an example, we will return to the Titanic decision tree model we used in the introduction. That figure diagrams the algorithms decision paths, while the bar plot below shows the feature importances. Since the model had a maximum depth of only 3, most of the data features weren’t used and thus didn’t have an importance score.

Decision Tree Feature Importance

Figure 2

Decision trees by themselves generally don’t perform very well, which is why there are ensemble methods such as random forests. These algorithms use many trees, each of which makes a prediction. They then all “vote” to get a final prediction. These algorithms have the advantage of being more accurate than random forests while still having an easily interpretable method. However, the transparency of the model is a little clouded since these models often have hundreds of trees.

Continuing the Titanic example, Figure 3 shows a random forest fitted with 4 trees. Note how even with this small number, it becomes a bit harder to interpret the results.

Random forest fitted with 4 trees

Figure 3

Additionally, random forests have the benefit of feature importance scores as well.

Random forest feature important

Figure 4

Explainability Methods

Interpretability is a fast-growing field in machine learning, leading to many methods, some of which are soon outdated. Explainability methods can be local — they explain a single or global prediction. They explain the overall model. We will go through some of the most popular methods currently available, but keep in mind, this isn’t an exhaustive list.

Local Interpretable Model-Agnostic Explanations

First published in 2016 by Ribeiro, et al., Local Interpretable Model-Agnostic Explanations (LIME) is a method used to explain the predictions of any classification model. It creates explainability around single predictions by perturbing the data and computing a simpler interpretable model. It is pretty computationally efficient but has the pitfall of only explaining individual predictions instead of the model as a whole. In 2019, Zafar and Khan proposed an improvement on the LIME algorithm, DLIME (where the D stands for deterministic). Instead of using random perturbations, this method uses clustering methods to select new instances.

Shapley Additive Explanations

The Shapley value is a concept based in game theory, first developed by Lloyd Shapley in the 1950s, but wasn’t applied to machine learning until 2017 by Lundberg and Lee. Since the original model was based in game theory, it had to be adapted to artificial intelligence. Thus, they use the outcome of the model as “the game” and the features of the data as “the players.”

The SHAPley Additive exPlanations (SHAP) algorithm computes the contribution to “the game” (ie, the prediction) for all possible permutations of features, which creates feature importances for that single prediction. By examining so many permutations, it is significantly more costly than the LIME algorithm. In addition to examining the distributions of individual features for observations, SHAP also provides global feature importances.

The left-hand plot below is a bar chart showing the mean SHAP values ​​for a model, thus giving a global feature importance representation. The right-hand plot shows the distribution of SHAP values ​​for all data points.

SHAP values ​​for a model and data points

Figure 5: Image source: “Global bar plot,” SHAP

Gradient-Weighted Class Activation Mapping

Proposed in 2019 by Selvaraju, et al. , Gradient-weighted Class Activation Mapping (Grad-CAM) is a technique for producing visual explanations for Convolutional Neural Network (CNN)-based models by highlighting important regions of an image. For instance, if a model classifies an image by identifying the animal, Grad-CAM will produce a heatmap on top of the image, showing what is most important in its decision. Since Grad-CAM is a generalization of CAM, it can be used on most kinds of CNN-based models, including image classification, image captioning, or visual question answering. Here is an example of the output for Grad-CAM identifying the word “boxer” in the left-most image.

Example of the output for Grad-CAM

Figure 6

For more details, check out the example on GitHub.


Explainable AI has been given a lot of attention lately over concern around black-box models. Black-box models needn’t always be concerning, but when they are, there are many mitigation strategies; some involve avoiding them altogether, while others utilize other algorithms to help explain predictions. It is important to match both the complexity and the interpretation requirements to your use case. Ask colleagues and potential end users for any requirements or preferences. There’s nothing worse than having to build a new model from scratch! Hopefully, you now have a few more methods to add to your toolkit.

This is an article from DZone’s 2021 Enterprise AI Trend Report.

For more:

Read the Report


Leave a Comment