Globally and Locally Consistent Image Completion论文阅读笔记

空洞卷积(dilate convolution)的作用:

  1. 扩大感受野
    在deep net中为了增加感受野且降低计算量,总要进行降采样(pooling或s2/conv),这样虽然可以增加感受野,但空间分辨率降低了。为了能不丢失分辨率,且仍然扩大感受野,可以使用空洞卷积。这在检测,分割任务中十分有用。一方面感受野大了可以检测分割大目标,另一方面分辨率高了可以精确定位目标。

  2. 捕获多尺度上下文信息
    空洞卷积有一个参数可以设置dilation rate,具体含义就是在卷积核中填充dilation rate-1个0,因此,当设置不同dilation rate时,感受野就会不一样,也即获取了多尺度信息。多尺度信息在视觉任务中相当重要啊。

  3. 在文章中的作用
    By using dilated convolutions at lower resolutions, the model can effectively
    “see” a larger area of the input image when computing each output pixel than with standard convolutional layers. The resulting network model computes each output pixel under the inffluence of a 307×307-pixel region of the input image. Without using dilated convolutions, it would only use a 99×99-pixel region, not allowing the completion of holes larger than 99 × 99 pixels, as depicted in Fig. 3.

Globally and Locally Consistent Image Completion论文阅读笔记_第1张图片

池化层(pooling layer)的作用:


  1. invariance(不变性),这种不变性包括translation(平移),rotation(旋转),scale(尺度)

  2. 保留主要的特征同时减少参数(降维,效果类似PCA)和计算量,防止过拟合,提高模型泛化能力


The local context discriminator follows the same pattern, except that the input is a 128 × 128-pixel image patch centered around the completed region. (Note that, at the training time, there is always a single completed region. The trained completion network can,however, fill-in any number of holes at the same time.) In the case the image is not a completed image, a random patch of the image is selected, as there is no completed region to center it on.


Finally, the outputs of the global and the local discriminators are concatenated together into a single 2048-dimensional vector,which is then processed by a single fully-connected layer, to output a continuous value. A sigmoid transfer function is used so that this value is in the [0, 1] range and represents the probability that the image is real, rather than completed.

