论文阅读笔记:Learning Deep Features for Discriminative Localization

Introduction

Task

基于弱监督学习的图像分类和定位(检测)
相关工作:

  • 弱监督目标定位
  • 可视化CNN

Method

Class Activation Mapping(CAM)

CAM技术详细且简洁地展示了如何用CNN进行目标定位(检测)以及可视化,原理很简单,主要基于global average pooling(GAP)

论文阅读笔记:Learning Deep Features for Discriminative Localization_第1张图片

  • Firstly, get the last convolutional layer feature maps f k ( x , y ) f_k(x,y) fk(x,y),is the k t h kth kth channel feature map, channel num is n n n

  • Sencondly,use global average pooling to get F k F_{k} Fk
    F k = ∑ x , y f k ( x , y ) F_{k} = \sum_{x,y} f_k(x,y) Fk=x,yfk(x,y)

  • Thirdly,use a FC layer,get class score S c S_{c} Sc,it can be used to compute softmax cross entropy loss and then to train
    S c = ∑ k w k c ∗ F k S_{c} = \sum_{k} w_{k}^{c} * F_{k} Sc=kwkcFk

  • Finally,we can get class activation map by the weights w k c w_{k}^{c} wkc for every class c c c,the resolution of M c ( x , y ) M_c(x,y) Mc(x,y) and f k ( x , y ) f_k(x,y) fk(x,y) is same, and we can upsample it to get final map(size is same with oral image)
    M c ( x , y ) = ∑ k w k c ∗ f k ( x , y ) M_c(x,y) = \sum_{k}w_{k}^{c} * f_{k}(x,y) Mc(x,y)=kwkcfk(x,y)

Experiments

classification result

论文阅读笔记:Learning Deep Features for Discriminative Localization_第2张图片
Compared with original network(VGG、GoogleLeNet et al),use GAP there is a small drop of 1%-2%.

Localization

论文阅读笔记:Learning Deep Features for Discriminative Localization_第3张图片
Compared with fully-supervised methods,use CAM there is a large difference,at last this method not use bounding box.

Conclusion

  • It is important that we can use classification-trained CNNs to learn to localization,without using any bounding box.
  • The class activation mapping method is easy to transfer to other task for example captioning、VAQ et al.

你可能感兴趣的:(论文阅读)