Globally and Locally Consistent Image Completion论文阅读笔记

空洞卷积(dilate convolution)的作用:

  1. 扩大感受野
    在deep net中为了增加感受野且降低计算量,总要进行降采样(pooling或s2/conv),这样虽然可以增加感受野,但空间分辨率降低了。为了能不丢失分辨率,且仍然扩大感受野,可以使用空洞卷积。这在检测,分割任务中十分有用。一方面感受野大了可以检测分割大目标,另一方面分辨率高了可以精确定位目标。

  2. 捕获多尺度上下文信息
    空洞卷积有一个参数可以设置dilation rate,具体含义就是在卷积核中填充dilation rate-1个0,因此,当设置不同dilation rate时,感受野就会不一样,也即获取了多尺度信息。多尺度信息在视觉任务中相当重要啊。

  3. 在文章中的作用
    By using dilated convolutions at lower resolutions, the model can effectively
    “see” a larger area of the input image when computing each output pixel than with standard convolutional layers. The resulting network model computes each output pixel under the inffluence of a 307×307-pixel region of the input image. Without using dilated convolutions, it would only use a 99×99-pixel region, not allowing the completion of holes larger than 99 × 99 pixels, as depicted in Fig. 3.

Globally and Locally Consistent Image Completion论文阅读笔记_第1张图片

池化层(pooling layer)的作用:

下采样层也叫池化层,其具体操作与卷基层的操作基本相同,只不过下采样的卷积核为只取对应位置的最大值、平均值等(最大池化、平均池化),并且不经过反向传播的修改。

  1. invariance(不变性),这种不变性包括translation(平移),rotation(旋转),scale(尺度)

  2. 保留主要的特征同时减少参数(降维,效果类似PCA)和计算量,防止过拟合,提高模型泛化能力

局部判决器

The local context discriminator follows the same pattern, except that the input is a 128 × 128-pixel image patch centered around the completed region. (Note that, at the training time, there is always a single completed region. The trained completion network can,however, fill-in any number of holes at the same time.) In the case the image is not a completed image, a random patch of the image is selected, as there is no completed region to center it on.
局部判决器遵循同样的模式,出了输入是一个以修复区域为中心的128x128像素的图像(值得指出的是,在训练期间,总是存在单一的修复区域。训练过的修复网络可以同时修复任意数量的孔洞)在图片不是一个已经修复的图片的情况下,可以选择图片上的随机补丁(当然也应该是128x128的),因为没有修复区域可以被当做中心。

连接部分

Finally, the outputs of the global and the local discriminators are concatenated together into a single 2048-dimensional vector,which is then processed by a single fully-connected layer, to output a continuous value. A sigmoid transfer function is used so that this value is in the [0, 1] range and represents the probability that the image is real, rather than completed.
最终,全局判决器和局部判决器的输出连接并形成一个2048维的向量,接着经过一个全连接层梳理之后,输出一个连续的值。采用一个sigmoid激活函数将这个值转换到[0,1]的范围内,表示图片是真的而不是经过网络修复的概率。

Keras实现代码:

https://github.com/neka-nat/image_completion_tf2

你可能感兴趣的:(深度学习及Python相关知识)