DeepLab v2论文阅读笔记

deepLab v2 论文阅读笔记

作者的3个contributions:

  1. 使用空洞卷积,在不增加计算量的获得增大感受野
First, we highlight convolution with upsampled filters, or ‘atrous convolution’, as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation
  1. 提出空洞空间金字塔池化(ASPP), 实现多尺度
Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales
  1. 引入条件随机场,增大局部位置分割的准确率。
Third, we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance.

提出背景: DCNN在图像分割的3个挑战

  1. 由于下采样(stride >=2)或池化,图片的分辨率会减少reduced feature resolution.

  2. 分割目标尺度不唯一 existence of objects at multiple scales

  3. 由于卷积的不变性,分割的局部准确率会降低。reduced localization accuracy due to DCNN invariance.

**基于第一个挑战 ** (由于下采样(stride >=2)或池化,图片的分辨率会减少)

作者使用stride=1的pooling + rate=2的空洞卷积 替代原始stride=2的pooling+一般的卷积,可以在恒定分辨率的同时增加感受野

  • 空洞卷积的详细定义:

    Atrous convolution

(a mathod that allow us to arbitrary enlarge the FOV without pooling or deeper)with rate r introduces r-1 zeros between consecutive filter values, effectively enlarging the kernel size of a k×k filter to k_new = k + (k -1)(r - 1) without increasing the number of parameters or the amount of computation
  • 两种场景实现空洞卷积

    • 上采样,对feature map进行insert hole(zeros)后进行卷积

      • 目前常用的上采样方式
        • 双线性插值
        • shift-and-stitch
        • 记录padding位置,其它位置先补0,然后卷积填充
        • FCN中使用的上采样方式(查相应资料,没有太明白)
    • 下采样,卷积核补0

基于第二个挑战 ** (分割目标尺度不唯一)

  • 传统的方法(resize,或者做成图片金字塔):

    同一幅图像进行rescale成不同resolution输入到网络,每一个分辨率对应网络的一个分支,各分支之间共享网络的参数,在最后一层通过插值使得不同的branch的输出都和原始图像分辨率一致,然后将所有branch的图像合并(fuse)(取像素最大的值为输出)

    The first approach amounts to standard multiscale processing [17],[18]. We extract DCNN score maps from multiple (three in our experiments) rescaled versions of the original image using parallel DCNN branches that share the same parameters. To produce the final result, we bilinearly interpolate the feature maps from the parallel DCNN branches to the original image resolution and fuse them, by taking at each position the maximum response across the different scales
  • 空洞空间金字塔池化(ASPP)

    它的不同尺度的图像不是通过rescale获得,而是通过不同rate的空洞卷积获得

    We have implemented a variant of their scheme which uses multiple parallel atrous convolutional layers with different sampling rates. The features extracted for each sampling rate are further processed in separate branches and fused to generate the final result.

DeepLab v2论文阅读笔记_第1张图片

基于第三个挑战 ** (由于卷积的不变性,分割的局部准确率会降低)

  • 采用级联卷积神经网络 propagating the coarse results to another DCNN.

  • 上采样+跳跃连接 upsample and concatenate the scores from intermediate feature maps

  • DCNN+概率图模型(如CRF) integrating the densely connected CRF on top of the DCNN

    • 一种是将CRF作为后处理(作者这篇论文采用这个形式)

    • 另一种将CRF等模型嵌入到网络实现端到端训练

你可能感兴趣的:(论文学习)