Tensorflow深度学习之二十六:atrous convolution

一、空洞卷积(以下信息来自TensorFlow和Google翻译)
  Computes a 2-D atrous convolution, also known as convolution with holes or dilated convolution, given 4-D `value` and `filters` tensors. If the `rate` parameter is equal to one, it performs regular 2-D convolution. If the `rate` parameter is greater than one, it performs convolution with holes, sampling the input values every `rate` pixels in the `height` and `width` dimensions. This is equivalent to convolving the input with a set of upsampled filters, produced by inserting `rate - 1` zeros between two consecutive values of the filters along the `height` and `width` dimensions, hence the name atrous convolution or convolution with holes (the French word trous means holes in English).

  计算2-D atrous卷积,也称为带孔或卷积的卷积扩张卷积,给出4-D`value’和\’filters\’张量。如果`率`参数等于1,它执行常规的2-D卷积。如果`rate`参数大于1,它执行带孔的卷积,采样输入值为“height”和“width”维度中的每个“rate”像素。这相当于使用一组上采样过滤器对输入进行卷积,通过在两个连续的值之间插入`rate-1`零来产生沿着’height`和`width`尺寸过滤,因此名称很难带孔的卷积或卷积(法语单词trous表示孔)。

  Atrous convolution allows us to explicitly control how densely to compute feature responses in fully convolutional networks. Used in conjunction with bilinear interpolation, it offers an alternative to conv2d_transpose in dense prediction tasks such as semantic image segmentation, optical flow computation, or depth estimation. It also allows us to effectively enlarge the field of view of filters without increasing the number of parameters or the amount of computation.

  Atrous卷积允许我们明确地控制计算的密集程度完全卷积网络中的特征响应。与双线性插值一起使用,它提供了`conv2d_transpose`的替代方案密集的预测任务,如语义图像分割,光流计算或深度估计。它还使我们能够有效地扩大过滤器的视野,不增加参数的数量或计算量。

  For a description of atrous convolution and how it can be used for dense feature extraction, please see: Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. The same operation is investigated further in [Multi-Scale Context Aggregation by Dilated Convolutions] http://arxiv.org/abs/1511.07122). Previous works that effectively use atrous convolution in different ways are, among others, OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks and Fast Image Scanning with Deep Max-Pooling Convolutional Neural Networks. Atrous convolution is also closely related to the so-called noble identities in multi-rate signal processing.

  有关atrous卷积的描述以及它如何用于密集特征提取,请参阅:[深度语义图像分割卷积网和完全连接的CRF]http://arxiv.org/abs/1412.7062)。在[Multi-Scale Context Aggregation]中进一步研究了相同的操作通过扩张卷积]http://arxiv.org/abs/1511.07122)。以前的作品有效地使用不同方式的萎缩卷积,其中包括:[OverFeat:集成识别,定位和检测使用卷积网络](http://arxiv.org/abs/1312.6229)和[快速图像使用Deep Max-Pooling卷积神经网络进行扫描] http://arxiv.org/abs/1302.1700)。痛苦的卷积也与所谓的贵族身份密切相关在多速率信号处理中。

二、空洞卷积过程的个人理解
Tensorflow深度学习之二十六:atrous convolution_第1张图片
Standard Convolution with a 3 x 3 kernel (and padding)
Tensorflow深度学习之二十六:atrous convolution_第2张图片
Dilated Convolution with a 3 x 3 kernel and dilation rate 2

  正如上述,atrous卷积通过在卷积核中插入特定数目的0来实现扩大卷积核的感受野。扩大后的卷积核大小为:

new_height=rate(height1)+1new_width=rate(width1)+1 n e w _ h e i g h t = r a t e ∗ ( h e i g h t − 1 ) + 1 n e w _ w i d t h = r a t e ∗ ( w i d t h − 1 ) + 1

  以上公式中,height为原卷积核的height,new_height为插入空洞之后的height。同理,width和new_width分别是原卷积核的width和插入空洞之后的width。rate为扩张率,这里可以形象地理解为需要插入的空洞的数目,rate=1时表示不插入,rate=2时表示相邻元素之间只插入一个空洞(即0)。

  当padding参数为VALID时,输入的张量不需要做任何更改即可和扩张之后的卷积核进行操作,此时就相当于进行了如下过程:

conv2d(input, atrous_kernel, [1, 1, 1, 1], 'VALID')
# 注:
# input为需要被卷积的张量或者特征图,
# atrous_kernel为前述的经过插入空洞的扩张之后的卷积核
# strides在这里必须为[1, 1, 1, 1]
# padding在这里也必须为'VALID'

  当padding参数为SAME时,我们希望获得一个张量,该张量与原来的张量具有相同大小(height和width相同),因此,我们需要在原来的张量周围添加合适数目的0。

  我们以height方向为例,假设原来张量的height为H,原来的卷积核的height为h,那么经过插入空洞的操作,卷积核的height变成了rate*(h-1)+1。由于在atrous卷积中,strides始终为1,因此,为了获得与原来张量相同的卷积后的张量,我们需要满足下面的公式:(x在这里表示height方向上的需要添加的0的层数)

H+x(rate(h1)+1)+1=H H + x − ( r a t e ∗ ( h − 1 ) + 1 ) + 1 = H

得到:
x=rate(h1) x = r a t e ∗ ( h − 1 )

   x x 为height方向上,上方和下方需要添加的0的层数。当 rate(h1) r a t e ∗ ( h − 1 ) 为偶数时,上方和下方需要添加的0的层数相等,当 rate(h1) r a t e ∗ ( h − 1 ) 为奇数时,上方需要添加的数目比下方少1。

  上面是以height方向为例,同理可以推至width方向。

  当padding参数为SAME时,相当于进行了如下的过程:

conv2d(input_after_paddind_zero, atrous_kernel, [1, 1, 1, 1], 'VALID')
# 注:
# input_after_paddind_zero为经过上述添加0之后的张量或者特征图。
# atrous_kernel为前述的经过插入空洞的扩张之后的卷积核
# strides在这里必须为[1, 1, 1, 1]
# padding在这里也必须为'VALID'

三、参数

def atrous_conv2d(value, filters, rate, padding, name=None)
参数 作用
value A 4-D Tensor of type float. It needs to be in the default “NHWC” format. Its shape is [batch, in_height, in_width, in_channels]. 4维张量,表示输入的特征图
filters A 4-D Tensor with the same type as value and shape [filter_height, filter_width, in_channels, out_channels]. filtersin_channels dimension must match that of value. Atrous convolution is equivalent to standard convolution with upsampled filters with effective height filter_height + (filter_height - 1) * (rate - 1) and effective width filter_width + (filter_width - 1) * (rate - 1), produced by inserting rate - 1 zeros along consecutive elements across the filters’ spatial dimensions. 卷积核
rate A positive int32. The stride with which we sample input values across the height and width dimensions. Equivalently, the rate by which we upsample the filter values by inserting zeros across the height and width dimensions. In the literature, the same parameter is sometimes called input stride or dilation. rate,表示插入空洞的数目
padding A string, either 'VALID' or 'SAME'. The padding algorithm. padding
name Optional name for the returned tensor. 名称

注:没有strides参数!!!其他参数含义和tf.nn.conv2d相同

四、自实现atrous卷积
  代码如下:

import tensorflow as tf
import numpy as np


'''
注:
这里的tensor和kernel均为二维矩阵,即:
len(tensor.shape) = 2
len(kernel.shape) = 2
'''

def my_atrous_conv2d(tensor, kernel, rate, padding):
    tensor = np.array(tensor, dtype=np.float32)
    kernel = np.array(kernel, dtype=np.float32)

    shape = kernel.shape

    # 定义一个矩阵,存放插入空洞之后的卷积核
    atrous_kernel = np.zeros(shape=[rate * (shape[0] - 1) + 1, rate * (shape[1] - 1) + 1], dtype=np.float32)

    # 将原卷积核的元素依次放入对应位置
    for i, line in enumerate(kernel):
        for j, number in enumerate(line):
            atrous_kernel[i * rate, j * rate] = number

    # 当padding='valid'时:
    if padding.lower() == 'valid':
        tensor = np.expand_dims(tensor, axis=-1)
        tensor = np.expand_dims(tensor, axis=0)

        atrous_kernel = np.expand_dims(atrous_kernel, axis=-1)
        atrous_kernel = np.expand_dims(atrous_kernel, axis=-1)

        # 直接使用卷积操作
        conv = tf.nn.conv2d(tensor, atrous_kernel, [1, 1, 1, 1], 'VALID')
        return conv

    # 当padding='SAME'时:
    else:
        atrous_kernel = np.expand_dims(atrous_kernel, axis=-1)
        atrous_kernel = np.expand_dims(atrous_kernel, axis=-1)

        # 进行上下左右填充0
        up = np.zeros(shape=[rate * (kernel.shape[1] - 1) // 2, tensor.shape[1]], dtype=np.float32)
        bottom = np.zeros(shape=[rate * (kernel.shape[1] - 1) - rate * (kernel.shape[1] - 1) // 2, tensor.shape[1]], dtype=np.float32)

        tensor = np.concatenate((up, tensor, bottom), axis=0)

        left = np.zeros(shape=[tensor.shape[0], rate * (kernel.shape[0] - 1) // 2], dtype=np.float32)
        right = np.zeros(shape=[tensor.shape[0], rate * (kernel.shape[0] - 1) - rate * (kernel.shape[0] - 1) // 2], dtype=np.float32)

        tensor = np.concatenate((left, tensor, right), axis=1)

        tensor = np.expand_dims(tensor, axis=-1)
        tensor = np.expand_dims(tensor, axis=0)

        # 最后返回卷积结果
        return tf.nn.conv2d(tensor, atrous_kernel, [1, 1, 1, 1], 'VALID')


def tf_atrous_conv2d(tensor, kernel, rate, padding):
    tensor = np.array(tensor, dtype=np.float32)
    kernel = np.array(kernel, dtype=np.float32)

    tensor = np.expand_dims(tensor, 0)
    tensor = np.expand_dims(tensor, -1)

    kernel = np.expand_dims(kernel, -1)
    kernel = np.expand_dims(kernel, -1)

    return tf.nn.atrous_conv2d(tensor, kernel, rate, padding)


if __name__ == '__main__':
    a = np.arange(81)
    np.random.shuffle(a)
    a = np.reshape(a, [9, 9])
    k = np.arange(99, 99+9)
    np.random.shuffle(k)
    k = np.reshape(k, [3, 3])

    atrous_conv1 = my_atrous_conv2d(a, k, 3, 'SAME')
    atrous_conv2 = tf_atrous_conv2d(a, k, 3, 'SAME')

    with tf.Session() as sess1:
        tf.global_variables_initializer().run()
        a1 = atrous_conv1.eval().reshape([9, 9])
        a2 = atrous_conv2.eval().reshape([9, 9])
        print('my_atrous_conv2d:')
        print(a1)
        print('tf_atrous_conv2d:')
        print(a2)
        print('equal:')
        print(a1 == a2)

    print('='*32)

    a = np.random.random([9, 9]).astype(np.float32)
    k = np.random.random([3, 3]).astype(np.float32)

    atrous_conv1 = my_atrous_conv2d(a, k, 3, 'SAME')
    atrous_conv2 = tf_atrous_conv2d(a, k, 3, 'SAME')

    with tf.Session() as sess2:
        tf.global_variables_initializer().run()
        a1 = atrous_conv1.eval().reshape([9, 9])
        a2 = atrous_conv2.eval().reshape([9, 9])
        print('my_atrous_conv2d:')
        print(a1)
        print('tf_atrous_conv2d:')
        print(a2)
        print('equal:')
        print(a1 - a2 < 1e-6)

  结果如下:

my_atrous_conv2d:
[[16774. 15349. 17182. 18997. 22688. 25712. 12038. 16006. 15595.]
 [10372. 23398. 22395. 14290. 32495. 24145.  9781. 19842. 16505.]
 [15025. 23846.  5930. 26586. 25956. 15106. 14836. 16754. 11041.]
 [24527. 30590. 28034. 33654. 44940. 37008. 20302. 30919. 20154.]
 [17262. 29459. 30172. 26122. 44556. 35347. 18303. 30932. 27821.]
 [22972. 36041. 16298. 38027. 41745. 31230. 22169. 28649. 22251.]
 [13632. 18299. 16718. 22239. 28881. 24208. 14944. 19558. 12480.]
 [12129. 17634. 17297. 18678. 26119. 22487. 12831. 17893. 19671.]
 [13000. 25086. 15062. 22432. 29108. 28309. 14388. 19006. 20159.]]
tf_atrous_conv2d:
[[16774. 15349. 17182. 18997. 22688. 25712. 12038. 16006. 15595.]
 [10372. 23398. 22395. 14290. 32495. 24145.  9781. 19842. 16505.]
 [15025. 23846.  5930. 26586. 25956. 15106. 14836. 16754. 11041.]
 [24527. 30590. 28034. 33654. 44940. 37008. 20302. 30919. 20154.]
 [17262. 29459. 30172. 26122. 44556. 35347. 18303. 30932. 27821.]
 [22972. 36041. 16298. 38027. 41745. 31230. 22169. 28649. 22251.]
 [13632. 18299. 16718. 22239. 28881. 24208. 14944. 19558. 12480.]
 [12129. 17634. 17297. 18678. 26119. 22487. 12831. 17893. 19671.]
 [13000. 25086. 15062. 22432. 29108. 28309. 14388. 19006. 20159.]]
equal:
[[ True  True  True  True  True  True  True  True  True]
 [ True  True  True  True  True  True  True  True  True]
 [ True  True  True  True  True  True  True  True  True]
 [ True  True  True  True  True  True  True  True  True]
 [ True  True  True  True  True  True  True  True  True]
 [ True  True  True  True  True  True  True  True  True]
 [ True  True  True  True  True  True  True  True  True]
 [ True  True  True  True  True  True  True  True  True]
 [ True  True  True  True  True  True  True  True  True]]
================================
my_atrous_conv2d:
[[0.85756415 0.97809374 0.6221388  0.47452182 1.1971059  1.4956675
  0.6211113  0.6013928  1.2532485 ]
 [1.3485698  0.9727119  1.4426544  1.2280183  1.2285496  1.2395221
  0.69691694 0.9009933  0.9230705 ]
 [1.0359509  0.3736195  1.0340105  0.7417851  0.95010257 0.73275506
  0.8190989  0.3943194  0.7381625 ]
 [1.4999393  1.8689545  2.3845592  1.604199   2.3503845  2.3006964
  0.33571613 1.4319618  1.351671  ]
 [2.6279428  2.2722342  1.7000405  1.9155477  1.7234559  2.4140751
  0.7227813  0.7500137  1.5047629 ]
 [1.5855988  1.7331333  1.0298394  1.8303206  1.6153543  1.8850565
  1.0260942  1.2247385  0.9588959 ]
 [0.5739771  1.4931052  0.47283304 0.4867544  1.1949276  1.5942652
  0.550773   0.6214407  1.1143594 ]
 [0.9355327  1.0806943  1.419316   1.126513   1.2977234  1.4669281
  0.5325237  0.8319793  0.85846186]
 [0.86645645 0.5750332  0.7152909  1.0724365  0.5485303  0.91686004
  0.77844644 0.42520177 0.64212894]]
tf_atrous_conv2d:
[[0.85756415 0.97809374 0.6221388  0.4745218  1.1971059  1.4956675
  0.6211113  0.6013928  1.2532485 ]
 [1.3485698  0.9727119  1.4426544  1.2280183  1.2285496  1.2395222
  0.69691694 0.9009933  0.9230705 ]
 [1.0359509  0.3736195  1.0340106  0.7417851  0.9501025  0.73275506
  0.8190989  0.3943194  0.7381625 ]
 [1.4999394  1.8689545  2.3845594  1.604199   2.3503845  2.3006964
  0.33571613 1.4319618  1.351671  ]
 [2.6279428  2.2722342  1.7000405  1.9155477  1.7234559  2.4140751
  0.7227813  0.7500137  1.5047629 ]
 [1.585599   1.7331333  1.0298394  1.8303206  1.6153543  1.8850565
  1.0260942  1.2247385  0.9588959 ]
 [0.5739771  1.4931052  0.47283304 0.48675442 1.1949276  1.5942653
  0.550773   0.6214407  1.1143594 ]
 [0.9355327  1.0806943  1.419316   1.126513   1.2977233  1.4669281
  0.53252375 0.8319793  0.85846186]
 [0.86645645 0.5750332  0.7152909  1.0724365  0.54853034 0.9168601
  0.7784464  0.42520177 0.64212894]]
equal:
[[ True  True  True  True  True  True  True  True  True]
 [ True  True  True  True  True  True  True  True  True]
 [ True  True  True  True  True  True  True  True  True]
 [ True  True  True  True  True  True  True  True  True]
 [ True  True  True  True  True  True  True  True  True]
 [ True  True  True  True  True  True  True  True  True]
 [ True  True  True  True  True  True  True  True  True]
 [ True  True  True  True  True  True  True  True  True]
 [ True  True  True  True  True  True  True  True  True]]

  实现atrous卷积的方法很多,以上只是最最基础最简单的一种。

你可能感兴趣的:(深度学习,Tensorflow)