deeplab中 global average pooling(GAP) 原理

之前有疑惑deeplab中ASPP(Atrous Spatial Pyramid Pooling) 采用多尺度空洞卷积+GAP并联提取多尺度语义信息时,GAP得到的结果是一维的,而空洞卷积得到的结果时二维的,不知道其中的细节是怎么操作从而实现并联的

查资料+看源码

资料

https://sthalles.github.io/deep_segmentation_network/写到
为了增加全局信息,ASPP先对feature map 进行GAP,然后对GAP结果使用256个1*1 卷积,最后使用双线性差值

小实验

import torch.nn as nn
import torch
import torch.nn.functional as F


input = torch.randn(1,2,3,3)
print(input)

global_avg_pool = nn.Sequential(nn.AdaptiveAvgPool2d((1, 1)),
                                nn.Conv2d(2, 4, 1, stride=1, bias=False))

out=global_avg_pool(input)
print(out)

x5 = F.interpolate(out, size=(5,5), mode='bilinear', align_corners=True)
print(x5)
tensor([[[[-0.8494,  0.7672, -1.2737],
          [ 0.4866,  0.9821,  1.9463],
          [ 0.8208, -0.3884,  0.9766]],

         [[ 0.8611, -0.8998, -0.1001],
          [-0.8969, -0.7486, -0.7516],
          [-1.3810, -0.2459,  0.3907]]]])
          
tensor([[[[ 0.3816]],

         [[ 0.1271]],

         [[ 0.1947]],

         [[-0.0331]]]], grad_fn=)
         
tensor([[[[ 0.3816,  0.3816,  0.3816,  0.3816,  0.3816],
          [ 0.3816,  0.3816,  0.3816,  0.3816,  0.3816],
          [ 0.3816,  0.3816,  0.3816,  0.3816,  0.3816],
          [ 0.3816,  0.3816,  0.3816,  0.3816,  0.3816],
          [ 0.3816,  0.3816,  0.3816,  0.3816,  0.3816]],

         [[ 0.1271,  0.1271,  0.1271,  0.1271,  0.1271],
          [ 0.1271,  0.1271,  0.1271,  0.1271,  0.1271],
          [ 0.1271,  0.1271,  0.1271,  0.1271,  0.1271],
          [ 0.1271,  0.1271,  0.1271,  0.1271,  0.1271],
          [ 0.1271,  0.1271,  0.1271,  0.1271,  0.1271]],

         [[ 0.1947,  0.1947,  0.1947,  0.1947,  0.1947],
          [ 0.1947,  0.1947,  0.1947,  0.1947,  0.1947],
          [ 0.1947,  0.1947,  0.1947,  0.1947,  0.1947],
          [ 0.1947,  0.1947,  0.1947,  0.1947,  0.1947],
          [ 0.1947,  0.1947,  0.1947,  0.1947,  0.1947]],

         [[-0.0331, -0.0331, -0.0331, -0.0331, -0.0331],
          [-0.0331, -0.0331, -0.0331, -0.0331, -0.0331],
          [-0.0331, -0.0331, -0.0331, -0.0331, -0.0331],
          [-0.0331, -0.0331, -0.0331, -0.0331, -0.0331],
          [-0.0331, -0.0331, -0.0331, -0.0331, -0.0331]]]],
       grad_fn=)

源码

deeplab 代码

assp网络

inplanes = 2048
self.aspp1 = _ASPPModule(inplanes, 256, 1, padding=0, dilation=dilations[0], BatchNorm=BatchNorm)
self.aspp2 = _ASPPModule(inplanes, 256, 3, padding=dilations[1], dilation=dilations[1], BatchNorm=BatchNorm)
self.aspp3 = _ASPPModule(inplanes, 256, 3, padding=dilations[2], dilation=dilations[2], BatchNorm=BatchNorm)
self.aspp4 = _ASPPModule(inplanes, 256, 3, padding=dilations[3], dilation=dilations[3], BatchNorm=BatchNorm)
self.global_avg_pool = nn.Sequential(nn.AdaptiveAvgPool2d((1, 1)),
                                             nn.Conv2d(inplanes, 256, 1, stride=1, bias=False),
                                             BatchNorm(256),
                                             nn.ReLU())

forward

x1 = self.aspp1(x)
x2 = self.aspp2(x)
x3 = self.aspp3(x)
x4 = self.aspp4(x)
x5 = self.global_avg_pool(x)
x5 = F.interpolate(x5, size=x4.size()[2:], mode='bilinear', align_corners=True)
x = torch.cat((x1, x2, x3, x4, x5), dim=1)

你可能感兴趣的:(deeplearning,语义分割)