论文小结-GCN语义分割(Large Kernel Matters —— Improve Semantic Segmentation by Global Convolutional Network)

Large Kernel Matters——Improve Semantic Segmentation by Global Convolutional Network


2、提出边界精细化模块Boundary Refinement block,提高语义分割前景物体边界的定位精度

One of recent trends in network architec- ture design is stacking small filters (e.g., 1x1 or 3x3) in the entire network because the stacked small filters is more efficient than a large kernel, given the same computational complexity. However, in the field of semantic segmentation, where we need to perform dense perpixel prediction, we find that the large kernel (and effective receptive field) plays an important role when we have to perform the classification and localization tasks simultaneously. Following our design principle, we propose a Global Convolutional Network to address both the classification and localization issues for the semantic segmentation. We also suggest a residual-based boundary refinement to further refine the object boundaries. Our approach achieves state-of-art performance on two public benchmarks and significantly outperforms previous results, 82.2% (vs 80.2%) on PASCAL VOC 2012 dataset and 76.9% (vs 71.8%) on Cityscapes dataset.

但是呢,分类和定位任务又是natrually contradictory的:分类任务需要转换的不变性(比如平移和旋转),而定位任务需要对转换的敏感性
论文小结-GCN语义分割(Large Kernel Matters —— Improve Semantic Segmentation by Global Convolutional Network)_第1张图片
上图的解释:(Feature层表示提取的特征图,classifier表示分类器,分类网络有一个分类器在特征图后面,分割网络是aligned classifier,Our Work是两者的组合。)
典型的是imagenet竞赛中经典的分类特征提取网络如AlexNet、VGG Net 、 GoogleNet 、ResNet等锥形结构网络。通过对应的隐层进行特征提取,尾部的分类器或全局池化对提取的特征进行密集连接。这使得网络对局部扰动和低阶分类器具有鲁棒性,能够处理不同类型的输入转换。


C.Our Work文章提出的结构
论文小结-GCN语义分割(Large Kernel Matters —— Improve Semantic Segmentation by Global Convolutional Network)_第2张图片
论文小结-GCN语义分割(Large Kernel Matters —— Improve Semantic Segmentation by Global Convolutional Network)_第3张图片
2)分类角度:大尺寸的卷积核,这里用对称的独立卷积核组成大size的卷积核,减小了大量的参数(kxk -> kx1+1xk),并且没有非线性操作。
论文小结-GCN语义分割(Large Kernel Matters —— Improve Semantic Segmentation by Global Convolutional Network)_第4张图片
还要注意这里文章有段原话是the kernel size of the convolutional
structure should be as large as possible.卷积核size尽可能的大,为什么呢?我认为是因为从定位角度出发取消了全连接结构,但又要维持分类性能即需要全连接的特性,所以用一个尽可能大的卷积size来取缔全连接结构。试想其实一个与feature map的宽高相同的kernel size进行卷积操作其实就是全连接计算。

论文小结-GCN语义分割(Large Kernel Matters —— Improve Semantic Segmentation by Global Convolutional Network)_第5张图片

论文小结-GCN语义分割(Large Kernel Matters —— Improve Semantic Segmentation by Global Convolutional Network)_第6张图片

文章通过消融实验(Ablation Experiments)对比了卷积核size对实验效果的影响,同时也比较了四种实现kxk卷积核的结构:
论文小结-GCN语义分割(Large Kernel Matters —— Improve Semantic Segmentation by Global Convolutional Network)_第7张图片
论文小结-GCN语义分割(Large Kernel Matters —— Improve Semantic Segmentation by Global Convolutional Network)_第8张图片
论文小结-GCN语义分割(Large Kernel Matters —— Improve Semantic Segmentation by Global Convolutional Network)_第9张图片

论文小结-GCN语义分割(Large Kernel Matters —— Improve Semantic Segmentation by Global Convolutional Network)_第10张图片

既然大size 的卷积核已经被验证有着重要的作用,那么干脆backbone里的卷积也直接用GCN好了,所以作者又提出了ResNet-GCN结构:
论文小结-GCN语义分割(Large Kernel Matters —— Improve Semantic Segmentation by Global Convolutional Network)_第11张图片
论文小结-GCN语义分割(Large Kernel Matters —— Improve Semantic Segmentation by Global Convolutional Network)_第12张图片

用ImageNet2015进行预训练,PASCAL VOC2012微调的实验对比:论文小结-GCN语义分割(Large Kernel Matters —— Improve Semantic Segmentation by Global Convolutional Network)_第13张图片

不同阶段比对,效果:GCN+BR > GCN > Baseline,并且加CRF效果更好,因为边界处理的确实存在问题。
论文小结-GCN语义分割(Large Kernel Matters —— Improve Semantic Segmentation by Global Convolutional Network)_第14张图片

论文小结-GCN语义分割(Large Kernel Matters —— Improve Semantic Segmentation by Global Convolutional Network)_第15张图片

论文小结-GCN语义分割(Large Kernel Matters —— Improve Semantic Segmentation by Global Convolutional Network)_第16张图片



# many are borrowed from https://github.com/ycszen/pytorch-ss/blob/master/gcn.py
class _GlobalConvModule(nn.Module):
    def __init__(self, in_dim, out_dim, kernel_size):
        super(_GlobalConvModule, self).__init__()
        pad0 = (kernel_size[0] - 1) / 2
        pad1 = (kernel_size[1] - 1) / 2
        # kernel size had better be odd number so as to avoid alignment error
        self.conv_l1 = nn.Conv2d(in_dim, out_dim, kernel_size=(kernel_size[0], 1),
                                 padding=(pad0, 0)) # 左kx1卷积
        self.conv_l2 = nn.Conv2d(out_dim, out_dim, kernel_size=(1, kernel_size[1]),
                                 padding=(0, pad1)) # 左1xk卷积
        self.conv_r1 = nn.Conv2d(in_dim, out_dim, kernel_size=(1, kernel_size[1]),
                                 padding=(0, pad1)) # 右1xk卷积
        self.conv_r2 = nn.Conv2d(out_dim, out_dim, kernel_size=(kernel_size[0], 1),
                                 padding=(pad0, 0)) # 右kx1卷积

    def forward(self, x):
        x_l = self.conv_l1(x)
        x_l = self.conv_l2(x_l)
        x_r = self.conv_r1(x)
        x_r = self.conv_r2(x_r)
        x = x_l + x_r   # sum操作
        return x


class _BoundaryRefineModule(nn.Module):
    def __init__(self, dim):
        super(_BoundaryRefineModule, self).__init__()
        self.relu = nn.ReLU(inplace=True)
        self.conv1 = nn.Conv2d(dim, dim, kernel_size=3, padding=1)  # 分支3x3卷积
        self.conv2 = nn.Conv2d(dim, dim, kernel_size=3, padding=1)  # 分支3x3卷积

    def forward(self, x):
        residual = self.conv1(x)
        residual = self.relu(residual)  # Conv + ReLU
        residual = self.conv2(residual) # Conv
        out = x + residual  # sum操作
        return out

论文小结-GCN语义分割(Large Kernel Matters —— Improve Semantic Segmentation by Global Convolutional Network)_第17张图片

class GCN(nn.Module):
    def __init__(self, num_classes, input_size, pretrained=True):
        super(GCN, self).__init__()
        self.input_size = input_size
        resnet = models.resnet152()
        if pretrained:
        self.layer0 = nn.Sequential(resnet.conv1, resnet.bn1, resnet.relu)
        self.layer1 = nn.Sequential(resnet.maxpool, resnet.layer1)
        self.layer2 = resnet.layer2
        self.layer3 = resnet.layer3
        self.layer4 = resnet.layer4

        # 所有GCN模块
        self.gcm1 = _GlobalConvModule(2048, num_classes, (7, 7))
        self.gcm2 = _GlobalConvModule(1024, num_classes, (7, 7))
        self.gcm3 = _GlobalConvModule(512, num_classes, (7, 7))
        self.gcm4 = _GlobalConvModule(256, num_classes, (7, 7))

        # 所有BR模块
        self.brm1 = _BoundaryRefineModule(num_classes)
        self.brm2 = _BoundaryRefineModule(num_classes)
        self.brm3 = _BoundaryRefineModule(num_classes)
        self.brm4 = _BoundaryRefineModule(num_classes)
        self.brm5 = _BoundaryRefineModule(num_classes)
        self.brm6 = _BoundaryRefineModule(num_classes)
        self.brm7 = _BoundaryRefineModule(num_classes)
        self.brm8 = _BoundaryRefineModule(num_classes)
        self.brm9 = _BoundaryRefineModule(num_classes)

        initialize_weights(self.gcm1, self.gcm2, self.gcm3, self.gcm4, self.brm1, self.brm2, self.brm3,
                           self.brm4, self.brm5, self.brm6, self.brm7, self.brm8, self.brm9)

    def forward(self, x):
        # if x: 512
        fm0 = self.layer0(x)  # 256
        fm1 = self.layer1(fm0)  # 128
        fm2 = self.layer2(fm1)  # 64
        fm3 = self.layer3(fm2)  # 32
        fm4 = self.layer4(fm3)  # 16

        gcfm1 = self.brm1(self.gcm1(fm4))  # 16
        gcfm2 = self.brm2(self.gcm2(fm3))  # 32
        gcfm3 = self.brm3(self.gcm3(fm2))  # 64
        gcfm4 = self.brm4(self.gcm4(fm1))  # 128

        # 上采样融合输出
        fs1 = self.brm5(F.upsample_bilinear(gcfm1, fm3.size()[2:]) + gcfm2)  # 32
        fs2 = self.brm6(F.upsample_bilinear(fs1, fm2.size()[2:]) + gcfm3)  # 64
        fs3 = self.brm7(F.upsample_bilinear(fs2, fm1.size()[2:]) + gcfm4)  # 128
        fs4 = self.brm8(F.upsample_bilinear(fs3, fm0.size()[2:]))  # 256
        out = self.brm9(F.upsample_bilinear(fs4, self.input_size))  # 512

        return out
