Zhou B., Khosla A., Lapedriza A., Oliva A. and Torralba A. Learning Deep Features for Discriminative Localization. CVPR, 2016.
Selvaraju R., Das A., Vedantam R>, Cogswell M., Parikh D. and Batra D.Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization. ICCV, 2017.
Chattopadhyay A., Sarkar A. and Balasubramanian V. Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks. IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, 2018.
Wang H., Wang Z., Mardziel P., Hu X., Yang F., Du M., Ding S. and Zhang Z.Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks. CVPR, workshop, 2020.
CAM (class activation mapping) 是一种非常实用的可视化方法, 同时在弱监督学习中(如VQA)起了举足轻重的作用.
CAM的概念, 用于解释, 为什么神经网络能够这么有效, 而它究竟关注了什么?
符号 | 说明 |
---|---|
f ( ⋅ ) f(\cdot) f(⋅) | 网络 |
X X X | 网络输入 |
A l k A_l^k Alk | 第 l l l层的第 k k k张特征图(特指在卷积层中) |
w w w | 权重 |
c c c | 所关心的类别 |
α \alpha α | 用于CAM的权重 |
最开始的CAM仅用于特殊的CNN: 卷积层 + AvgPool + FC的结构,
设最后一层卷积层的特征图为 A L A_L AL, 则
f c ( X ) = w c T G P ( A L ) , [ G P ( A L ) ] k = 1 H W ∑ i H ∑ j W [ A L k ] i j , k = 1 , ⋯ , K . f_c(X) = {w^c}^T GP(A_L), [GP(A_L)]_k = \frac{1}{HW}\sum_i^H \sum_j^W [A_L^k]_{ij}, k=1,\cdots, K. fc(X)=wcTGP(AL),[GP(AL)]k=HW1i∑Hj∑W[ALk]ij,k=1,⋯,K.
进一步可以注意到,
f c ( X ) = 1 H W ∑ i H ∑ j W [ ∑ k = 1 K w k c [ A L k ] i j ] . f_c(X) = \frac{1}{HW} \sum_i^H \sum_j^W [\sum_{k=1}^K w_k^c [A_L^k]_{ij}]. fc(X)=HW1i∑Hj∑W[k=1∑Kwkc[ALk]ij].
于是可以定义:
[ L C A M c ] i j = ∑ k = 1 K α k c [ A L k ] i j , i = 1 , ⋯ H , j = 1 , ⋯ , W . [L_{CAM}^c]_{ij} = \sum_{k=1}^K \alpha_k^c [A_L^k]_{ij}, \quad i=1,\cdots H, j=1,\cdots, W. [LCAMc]ij=k=1∑Kαkc[ALk]ij,i=1,⋯H,j=1,⋯,W.
这里, α = w H W \alpha = \frac{w}{HW} α=HWw.
即
L C A M c = ∑ k = 1 K α k c A L k . L_{CAM}^c = \sum_{k=1}^K \alpha_k^c A_L^k. LCAMc=k=1∑KαkcALk.
一般, 这种score会最后加个relu:
L C A M c = R e L U ( ∑ k = 1 K α k c A L k ) . L_{CAM}^c = \mathrm{ReLU}(\sum_{k=1}^K \alpha_k^c A_L^k). LCAMc=ReLU(k=1∑KαkcALk).
普通的CAM有限制, Grad-CAM在此基础上进行扩展.
L G r a d − C A M c = R e L U ( ∑ k = 1 K α k c A l k ) , L_{Grad-CAM}^c = \mathrm{ReLU}(\sum_{k=1}^K \alpha_k^c A_l^k), LGrad−CAMc=ReLU(k=1∑KαkcAlk),
α k c = G P ( ∂ f c ∂ A l k ) = 1 H W ∑ i ∑ j ∂ f c ∂ [ A l k ] i j . \alpha_k^c = GP(\frac{\partial f^c}{\partial A_l^k})=\frac{1}{HW}\sum_i \sum_j \frac{\partial f_c}{\partial [A_l^k]_{ij}}. αkc=GP(∂Alk∂fc)=HW1i∑j∑∂[Alk]ij∂fc.
注意: L → l L \rightarrow l L→l.
作者认为, Grad-CAM++不能很好应对多个目标的情况, 应该进一步加权:
α k c = 1 H W ∑ i ∑ j α i j k c R e L U ( ∂ f c ∂ [ A l k ] i j ) , \alpha_k^c = \frac{1}{HW} \sum_i \sum_j \alpha_{ij}^{kc} \mathrm{ReLU}(\frac{\partial f_c}{\partial [A_{l}^k]_{ij}}), αkc=HW1i∑j∑αijkcReLU(∂[Alk]ij∂fc),
α i j k c = ∂ 2 f c ( ∂ [ A l k ] i j ) 2 2 ∂ 2 f c ( ∂ [ A l k ] i j ) 2 + ∑ i ∑ j [ A l k ] i j ∂ 3 f c ( ∂ [ A l k ] i j ) 3 . \alpha_{ij}^{kc}=\frac{\frac{\partial^2 f_c}{(\partial[A_{l}^k]_{ij})^2}}{2\frac{\partial^2 f_c}{(\partial[A_{l}^k]_{ij})^2}+\sum_i\sum_j[A_l^k]_{ij}\frac{\partial^3 f_c}{(\partial[A_{l}^k]_{ij})^3}}. αijkc=2(∂[Alk]ij)2∂2fc+∑i∑j[Alk]ij(∂[Alk]ij)3∂3fc(∂[Alk]ij)2∂2fc.
作者认为, 利用梯度计算score并不是一个很好的主意.
α k c = f c ( X ∘ H l k ) − f ( X b ) , \alpha_k^c = f_c(X \circ H_l^k) - f(X_b), αkc=fc(X∘Hlk)−f(Xb),
这里 X b X_b Xb是一个固定的基准向量, 作者直接取 f ( X b ) = 0 f(X_b)=\mathbb{0} f(Xb)=0,
H l k = s ( U p ( A l k ) ) , H_l^k = s(Up(A_l^k)), Hlk=s(Up(Alk)),
为将 A l k A_l^k Alk上采样至和 X X X相同大小, 并标准化:
s ( M ) = M − min M max M − min M , s(M) = \frac{M - \min M}{\max M - \min M}, s(M)=maxM−minMM−minM,
使其落于 [ 0 , 1 ] [0, 1] [0,1].
L ∗ c L^c_* L∗c最后也只是 H × W H\times W H×W的, 需要上采样到和 X X X一样的大小.
Pytorch-GradCAM
GradCAM
GradCAM++
ScoreCAM