CAM实现的流程(pytorch)

之前写了一个简化版本(简化版传送门)的可视化过程,简化版的可视化没有考虑到通道之间的关系。这篇将介绍cam的流程。
下一篇为Grad-Cam实现流程

目录

      • 流程图
      • 算法思路
      • 举个例子
      • 代码分析
        • 1.导入各种包,并且读取类别标签
        • 2.读取图片,并预处理
        • 3.加载预训练模型
        • 4.获取特征图
        • 5.获取权重
        • 6.定义计算CAM的函数
        • 7.生成图片

流程图

CAM实现的流程(pytorch)_第1张图片

算法思路

  1. 将要可视化的图片输进网络模型,判断出所属类别
  2. 获取最后一个卷积层的输出特征图
  3. 通过图片所属类别,得到权重,对获取的特征图的各个通道赋值,并且相加为单通道的特征图

举个例子

如果输入一张图片,通过网络模型之后,判断这张图片为第500类(总共1000类)。获取的特征图shape为(1,512,13,13),假设分类层为1 x 1卷积(这里就不算是最后一个卷积层,而是属于分类层)和全局平均池化组成。那么,1000个类别有1000种权重,也就是说能够给特征图赋1000种值。每个权重关注点不一样,所以才需要知道图片属于哪个类别。知道它是500类后,那么只需要拿出第500个类别的权重赋给特征图就ok了。
CAM算法有一个制约条件,需要用到全局平均池化的操作,如果最后有多层全连接层,那么CAM算法就不适用了。比如vgg16,最后一个卷积层之后,接了三个全连接层,由于卷积层的输出特征图需要flatten才能接入全连接层,在经过三个全连接层后,已经难以算出通道之间的联系,则很难去计算各个特征图通道的权重重要性。这种情况下就需要用到Grad-Cam算法了。

代码分析

先准备图片、标签以及模型
类别标签下载方法:
先安装axel:
sudo apt-get install axel
执行下载命令
axel -n 5 https://s3.amazonaws.com/outcome-blog/imagenet/labels.json
图片下载:
axel -n 5 http://media.mlive.com/news_impact/photo/9933031-large.jpg
模型下载:
senet1_1:axel -n 5 https://download.pytorch.org/models/squeezenet1_1-f364aa15.pth
resnet18:axel -n 5 https://download.pytorch.org/models/resnet18-5c106cde.pth
densenet161: axel -n 5 https://download.pytorch.org/models/densenet161-8d451a50.pth

1.导入各种包,并且读取类别标签

from PIL import Image
import torch
from torchvision import models, transforms
from torch.autograd import Variable
from torch.nn import functional as F
import numpy as np
import cv2
import json

# 读取 imagenet数据集的类别标签
json_path = './cam/labels.json'
with open(json_path, 'r') as load_f:
    load_json = json.load(load_f)
classes = {int(key): value for (key, value)
           in load_json.items()}

2.读取图片,并预处理

# 读取 imagenet数据集的某类图片
img_path = './cam/9933031-large.jpg'
normalize = transforms.Normalize(
    mean=[0.485, 0.456, 0.406],
    std=[0.229, 0.224, 0.225]
)

# 图片预处理
preprocess = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    normalize
])

img_pil = Image.open(img_path)
img_tensor = preprocess(img_pil)
img_variable = Variable(img_tensor.unsqueeze(0))

3.加载预训练模型

# 加载预训练模型
model_id = 1
if model_id == 1:
    net = models.squeezenet1_1(pretrained=False)
    pthfile = r'./pretrained/squeezenet1_1-f364aa15.pth'
    net.load_state_dict(torch.load(pthfile))
    finalconv_name = 'features'  # 获取卷积层的特征
elif model_id == 2:
    net = models.resnet18(pretrained=False)
    finalconv_name = 'layer4'
elif model_id == 3:
    net = models.densenet161(pretrained=False)
    finalconv_name = 'features'
net.eval()	# 使用eval()属性
print(net)

我只下了senet1_1,如果想使用其余两个模型,依葫芦画瓢自行修改。
打印模型的结果:

SqueezeNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2))
    (1): ReLU(inplace)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
    (3): Fire(
      (squeeze): Conv2d(64, 16, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(16, 64, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(16, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (4): Fire(
      (squeeze): Conv2d(128, 16, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(16, 64, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(16, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
    (6): Fire(
      (squeeze): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(32, 128, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(32, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (7): Fire(
      (squeeze): Conv2d(256, 32, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(32, 128, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(32, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (8): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
    (9): Fire(
      (squeeze): Conv2d(256, 48, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(48, 192, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(48, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (10): Fire(
      (squeeze): Conv2d(384, 48, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(48, 192, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(48, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (11): Fire(
      (squeeze): Conv2d(384, 64, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(64, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (12): Fire(
      (squeeze): Conv2d(512, 64, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(64, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
  )
  (classifier): Sequential(
    (0): Dropout(p=0.5)
    (1): Conv2d(512, 1000, kernel_size=(1, 1), stride=(1, 1))
    (2): ReLU(inplace)
    (3): AdaptiveAvgPool2d(output_size=(1, 1))
  )
)

可以看到特征提取部分在(features)中,分类层在(classifier)中。

4.获取特征图

features_blobs = []     # 后面用于存放特征图

def hook_feature(module, input, output):
    features_blobs.append(output.data.cpu().numpy())

# 获取 features 模块的输出
net._modules.get(finalconv_name).register_forward_hook(hook_feature)

register_forward_hook可以获取中间层输出,具体可自行百度。

5.获取权重

# 获取权重
params = list(net.parameters())
print(len(params))		# 52
weight_softmax = np.squeeze(params[-2].data.numpy())	# shape:(1000, 512)

params 中保存了模型的所有权重,怎么索引到我们需要的呢?再回到模型打印结果那里,由于pooling层和dropout层是不保存参数的,如果将所有的卷积、激活操作数下来,发现一共有52层有参数。如果要获取features模块到classifier模块的权重,那么就是获取classifier中(1): Conv2d(512, 1000, kernel_size=(1, 1), stride=(1, 1))的参数。这时,忽略最后一个全局平均池化,那么就是索引为-2的参数了。

logit = net(img_variable)				# 计算输入图片通过网络后的输出值
print(logit.shape)						# torch.Size([1, 1000])
print(params[-2].data.numpy().shape)	# 权重有1000种 (1000, 512, 1, 1)
print(features_blobs[0].shape)			# 特征图大小为 (1, 512, 13, 13)

# 结果有1000类,进行排序,并获得排序索引
h_x = F.softmax(logit, dim=1).data.squeeze()	
print(h_x.shape)						# torch.Size([1000])
probs, idx = h_x.sort(0, True)
probs = probs.numpy()					# 概率值排序
idx = idx.numpy()						# 类别索引排序,概率值越高,索引越靠前

# 取概率值为前5的类别看看类别名和概率值
for i in range(0, 5):
    print('{:.3f} -> {}'.format(probs[i], classes[idx[i]]))
'''
0.678 -> mountain bike, all-terrain bike, off-roader
0.088 -> bicycle-built-for-two, tandem bicycle, tandem
0.042 -> unicycle, monocycle
0.038 -> horse cart, horse-cart
0.019 -> lakeside, lakeshore

'''

6.定义计算CAM的函数

# 定义计算CAM的函数
def returnCAM(feature_conv, weight_softmax, class_idx):
    # 类激活图上采样到 256 x 256
    size_upsample = (256, 256)
    bz, nc, h, w = feature_conv.shape
    output_cam = []
    # 将权重赋给卷积层:这里的weigh_softmax.shape为(1000, 512)
    # 				feature_conv.shape为(1, 512, 13, 13)
    # weight_softmax[class_idx]由于只选择了一个类别的权重,所以为(1, 512)
    # feature_conv.reshape((nc, h * w))后feature_conv.shape为(512, 169)
    cam = weight_softmax[class_idx].dot(feature_conv.reshape((nc, h * w)))
    print(cam.shape)		# 矩阵乘法之后,为各个特征通道赋值。输出shape为(1,169)
    cam = cam.reshape(h, w) # 得到单张特征图
    # 特征图上所有元素归一化到 0-1
    cam_img = (cam - cam.min()) / (cam.max() - cam.min())  
    # 再将元素更改到 0-255
    cam_img = np.uint8(255 * cam_img)
    output_cam.append(cv2.resize(cam_img, size_upsample))
    return output_cam

7.生成图片

# 对概率最高的类别产生类激活图
CAMs = returnCAM(features_blobs[0], weight_softmax, [idx[0]])
# 融合类激活图和原始图片
img = cv2.imread(img_path)
height, width, _ = img.shape
heatmap = cv2.applyColorMap(cv2.resize(CAMs[0], (width, height)), cv2.COLORMAP_JET)
result = heatmap * 0.3 + img * 0.7
cv2.imwrite('CAM0.jpg', result)

cv2.applyColorMap函数的作用这里不再赘述,上一篇博客中已经涉及。
CAM实现的流程(pytorch)_第2张图片


# 对概率排在第五的类别产生类激活图
CAMs = returnCAM(features_blobs[0], weight_softmax, [idx[4]])
# 融合类激活图和原始图片
img = cv2.imread(img_path)
height, width, _ = img.shape
heatmap = cv2.applyColorMap(cv2.resize(CAMs[0], (width, height)), cv2.COLORMAP_JET)
result = heatmap * 0.3 + img * 0.7
cv2.imwrite('CAM1.jpg', result)

CAM实现的流程(pytorch)_第3张图片
差别一目了然

参考链接:
https://blog.csdn.net/qq_36825778/article/details/104193642
https://blog.csdn.net/u014264373/article/details/85415921

你可能感兴趣的:(pytorch)