The truth of gradient-based explanation methods

In sanity checks, the author discussed the case of one convolutional layer, where gradient will act as an edge detector. The gradient is

Screen Shot 2019-10-08 at 11.01.54 AM.png

It is now clear why edges will be visible in the produced gradient, regions in the image corresponding to an “edge” will have a distinct activation pattern from surrounding pixels. In contrast, pixel regions of the image which are more uniform will all have the same activation pattern, and thus the same value of ∂ l(x).

But why in multi-layer CNN, the guided back-propagation is not sensitive to the random weights and random data label?

My thought is the learnt weights will have a certain distribution. while the random weights will have mean for zero.

Thus for gradient methods, the output is averaged by the weights with zero mean.

While for Guided BP, the output multiples the ReLU(W), thus the mean of weight is high, the edges can still be recognized.

def ReLU(matrix):
    for i in range(len(matrix)):
        if matrix[i]>0:
            matrix[i] = matrix[i]
        else:
            matrix[i] = 0
    return matrix
import numpy as np

a1 = np.random.normal(1000, 1, 10000)
a2 = np.random.normal(100, 1, 10000)
w = np.random.normal(0, 1, 10000)
# print(np.dot(a1,w))
# print(np.sum(w))
b1 = np.sum(ReLU(np.dot(a1,w)*w))
b2 = np.sum(ReLU(np.dot(a2,w)*w))

c1 = np.sum(np.dot(a1,w)*ReLU(w))
c2 = np.sum(np.dot(a2,w)*ReLU(w))

print("b1-b2: gradient:", (b1-b2)/10000)
print("c1-c2: GBP:", (c1-c2)/10000)


b1-b2: gradient: 653.812524326697
c1-c2: GBP: -158205.37690500243

从这个代码就可以看出,如果输入a1, a2的差距很明显(代表edge上的两个不同的点), 那么random weights的gradient比GBP小很多很多。


Screen Shot 2019-10-08 at 11.01.54 AM.png

guided back propagation

the difference between backpropagation and gradient methods is the deal with w.

Screen Shot 2019-10-08 at 10.57.32 AM.png

backpropagation mind the bottom right part


Screen Shot 2019-10-08 at 10.57.23 AM.png

Screen Shot 2019-10-08 at 10.57.41 AM.png

你可能感兴趣的:(The truth of gradient-based explanation methods)