Absrtact: This paper introduces a visualization method of convolution network, which is called class activation graph visualization. Using the pre trained VGG16 model, an animal image which was randomly found on the Internet was analyzed, and the thermal map generated was combined with the original image to observe the classification basis of the network model. Use the Keras framework.
- Introduction to the principle of class activation diagram
- Keras implementation
"Deep learning with python"
"Grad-CAM: visual explanations from deep networks via gradientbased localization"
Class activation diagram visualization can generate class activation thermodynamic diagram for the specified input image. Class activation thermograms represent the importance of each location of an image to that class.
Grad CAM is different from CAM. In theory, all kinds of convolution networks can be visualized directly by this method, without modifying the network structure.
 The method given in is as follows: given an input image, for the output characteristic map of a convolution layer, each channel in the characteristic map is weighted by the gradient of the category relative to the channel. Intuitive understanding is to use the importance of each channel for a specific category to weight the activation intensity of different channels, so as to get the activation intensity of the input image for the category.
In the operation, the description is as follows: take the convolution of the last layer, calculate the gradient of each channel, then calculate the global average of the gradient of each channel, and then use the global average to weight the original channel. Next, use Keras to implement the above steps step by step.
The specific operation of visualization is introduced by using pre trained VGG16. Refer to .
First load VGG16 directly into Keras. Remember to use the model.summary() method to see the name of the last volume layer. This is block5 ﹣ conv3.
from keras.applications.vgg16 import VGG16 model = VGG16(weights='imagenet') model.summary()
At the same time, prepare the input data, read the image and convert it to the format of numpy array. The read method can be scipy.misc (obsolete), cv2, or the image of keras.preprocessing. The final dimension of input data is [1, 224, 224, 3].
import numpy as np import scipy from keras.applications.vgg16 import preprocess_input, decode_predictions img_path = 'image.jfif' img = scipy.misc.imread(img_path) img = scipy.misc.imresize(img, (224, 224)) x = np.expand_dims(img, axis=0) x = preprocess_input(x)
Then the input data is predicted to get the model output. As you can see, VGG16 thinks that 96.7% of the input is tiger.. The index of the maximum probability is 292.
preds = model.predict(x) print('Predicted:', decode_predictions(preds, top=3)) print(np.argmax(preds)) # Use this index to analyze the class activation graph between the input and the class.
>>> Predicted: [('n02129604', 'tiger', 0.96711606), ('n02123159', 'tiger_cat', 0.026872123), ('n02391049', 'zebra', 0.0046514785)] 292
The following begins to explain which part of the network judges it as a tiger. If it is misclassified, it can show where the network sees the input image, resulting in misclassification.
import keras.backend as K tiger_output = model.output[:, np.argmax(preds)]# The element corresponding to the tiger in the output. last_conv_layer = model.get_layer('block5_conv3')# Last convolution layer grads = K.gradients(tiger_output, last_conv_layer.output)# The gradient of tiger class's output characteristic graph for the last convolution layer pooled_grads = K.mean(grads, axis=(0, 1, 2))# Global average for the gradient of each channel # For a given input image, obtain the global average of the gradient and the output of the last convolution layer iterate = K.function([model.input], [pooled_grads, last_conv_layer.output]) pooled_grads_value, conv_layer_output_value = iterate([x]) # Given an input image, for the output characteristic map of a convolution layer, each channel in the characteristic map is weighted by the gradient of the category relative to the channel for i in range(512): conv_layer_output_value[:, :, i] *= pooled_grads_value[i] heatmap = np.mean(conv_layer_output_value, axis=-1)
The thermodynamic diagram is visualized.
import matplotlib.pyplot as plt heatmap = np.maximum(heatmap, 0) heatmap /= np.max(heatmap) plt.matshow(heatmap)
The visualization results of the thermodynamic diagram are shown in the following figure:
Finally, the thermal diagram and the original image are combined to get the final desired visualization results.
import cv2 img = cv2.imread(img_path) heatmap = cv2.resize(heatmap, (img.shape, img.shape)) heatmap = np.uint8(255 * heatmap) heatmap = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET) superimposed_img = heatmap * 0.5 + img cv2.imwrite('cam.jpg', superimposed_img)
The result is as shown in the figure: the original tiger image comes from the network.