空间注意力机制sam_【论文解读】用于卷积神经网络的注意力机制(Attention)----CBAM: Convolutional Block Attention Module...

摘要

论文提出了Convolutional Block Attention Module(CBAM),这是一种为卷积神将网络设计的,简单有效的注意力模块(Attention Module)。对于卷积神经网络生成的feature map,CBAM从通道和空间两个维度计算feature map的attention map,然后将attention map与输入的feature map相乘来进行特征的自适应学习。CBAM是一个轻量的通用模块,可以将其融入到各种卷积神经网络中进行端到端的训练。

主要思想

对于一个中间层的feature map:

,CBAM将会顺序推理出1维的channel attention map

以及2维的spatial attention map

,整个过程如下所示:

其中

为element-wise multiplication,首先将channel attention map与输入的feature map相乘得到

,之后计算

的spatial attention map,并将两者相乘得到最终的输出

。下图为CBAM的示意图:

CBAM的结构图

Channel attention module

feature map 的每个channel都被视为一个feature detector,channel attention主要关注于输入图片中什么(what)是有意义的。为了高效地计算channel attention,论文使用最大池化和平均池化对feature map在空间维度上进行压缩,得到两个不同的空间背景描述:

。使用由MLP组成的共享网络对这两个不同的空间背景描述进行计算得到channel attention map:

。计算过程如下:

其中

后使用了Relu作为激活函数。

Spatial attention module.

与channel attention不同,spatial attention主要关注于位置信息(where)。为了计算spatial attention,论文首先在channel的维度上使用最大池化和平均池化得到两个不同的特征描述

,然后使用concatenation将两个特征描述合并,并使用卷积操作生成spatial attention map

。计算过程如下:

其中,

表示7*7的卷积层

下图为channel attention和spatial attention的示意图:

(上)channel attention module;(下)spatial attention module

代码

环境:tensorflow 1.9

"""

@Time : 2018/10/19

@Author : Li YongHong

@Email : [email protected]

@File : test.py

"""

import tensorflow as tf

import numpy as np

slim = tf.contrib.slim

def combined_static_and_dynamic_shape(tensor):

"""Returns a list containing static and dynamic values for the dimensions.

Returns a list of static and dynamic values for shape dimensions. This is

useful to preserve static shapes when available in reshape operation.

Args:

tensor: A tensor of any type.

Returns:

A list of size tensor.shape.ndims containing integers or a scalar tensor.

"""

static_tensor_shape = tensor.shape.as_list()

dynamic_tensor_shape = tf.shape(tensor)

combined_shape = []

for index, dim in enumerate(static_tensor_shape):

if dim is not None:

combined_shape.append(dim)

else:

combined_shape.append(dynamic_tensor_shape[index])

return combined_shape

def convolutional_block_attention_module(feature_map, index, inner_units_ratio=0.5):

"""

CBAM: convolution block attention module, which is described in "CBAM: Convolutional Block Attention Module"

Architecture : "https://arxiv.org/pdf/1807.06521.pdf"

If you want to use this module, just plug this module into your network

:param feature_map : input feature map

:param index : the index of convolution block attention module

:param inner_units_ratio: output units number of fully connected layer: inner_units_ratio*feature_map_channel

:return:feature map with channel and spatial attention

"""

with tf.variable_scope("cbam_%s" % (index)):

feature_map_shape = combined_static_and_dynamic_shape(feature_map)

# channel attention

channel_avg_weights = tf.nn.avg_pool(

value=feature_map,

ksize=[1, feature_map_shape[1], feature_map_shape[2], 1],

strides=[1, 1, 1, 1],

padding='VALID'

)

channel_max_weights = tf.nn.max_pool(

value=feature_map,

ksize=[1, feature_map_shape[1], feature_map_shape[2], 1],

strides=[1, 1, 1, 1],

padding='VALID'

)

channel_avg_reshape = tf.reshape(channel_avg_weights,

[feature_map_shape[0], 1, feature_map_shape[3]])

channel_max_reshape = tf.reshape(channel_max_weights,

[feature_map_shape[0], 1, feature_map_shape[3]])

channel_w_reshape = tf.concat([channel_avg_reshape, channel_max_reshape], axis=1)

fc_1 = tf.layers.dense(

inputs=channel_w_reshape,

units=feature_map_shape[3] * inner_units_ratio,

name="fc_1",

activation=tf.nn.relu

)

fc_2 = tf.layers.dense(

inputs=fc_1,

units=feature_map_shape[3],

name="fc_2",

activation=None

)

channel_attention = tf.reduce_sum(fc_2, axis=1, name="channel_attention_sum")

channel_attention = tf.nn.sigmoid(channel_attention, name="channel_attention_sum_sigmoid")

channel_attention = tf.reshape(channel_attention, shape=[feature_map_shape[0], 1, 1, feature_map_shape[3]])

feature_map_with_channel_attention = tf.multiply(feature_map, channel_attention)

# spatial attention

channel_wise_avg_pooling = tf.reduce_mean(feature_map_with_channel_attention, axis=3)

channel_wise_max_pooling = tf.reduce_max(feature_map_with_channel_attention, axis=3)

channel_wise_avg_pooling = tf.reshape(channel_wise_avg_pooling,

shape=[feature_map_shape[0], feature_map_shape[1], feature_map_shape[2],

1])

channel_wise_max_pooling = tf.reshape(channel_wise_max_pooling,

shape=[feature_map_shape[0], feature_map_shape[1], feature_map_shape[2],

1])

channel_wise_pooling = tf.concat([channel_wise_avg_pooling, channel_wise_max_pooling], axis=3)

spatial_attention = slim.conv2d(

channel_wise_pooling,

1,

[7, 7],

padding='SAME',

activation_fn=tf.nn.sigmoid,

scope="spatial_attention_conv"

)

feature_map_with_attention = tf.multiply(feature_map_with_channel_attention, spatial_attention)

return feature_map_with_attention

#example

feature_map = tf.constant(np.random.rand(2,8,8,32), dtype=tf.float16)

feature_map_with_attention = convolutional_block_attention_module(feature_map, 1)

with tf.Session() as sess:

init = tf.global_variables_initializer()

sess.run(init)

result = sess.run(feature_map_with_attention)

print(result.shape)

你可能感兴趣的:(空间注意力机制sam)