the proposed gated convolution solves the issue of vanilla convolution that treats all input pixels as valid ones, generalizes partial convolution by providing a learnable dynamic feature selection mechanism for each channel at each spatial location across all layers.
对于free-form image inpainting任务,vanilla conv是无效
~~~~~~ 首先来看vanilla conv的公式: O y , x = ∑ i = − k h ′ k h ′ ∑ i = − k w ′ k w ′ W k h ′ + i , k w ′ + j ⋅ I y + i , x + j O_{y,x}=\sum_{i=-k_{h}^{'}}^{k_{h}^{'}} \sum_{i=-k_{w}^{'}}^{k_{w}^{'}} W_{k_{h}^{'}+i, k_{w}^{'}+j}\cdot I_{y+i, x+j} Oy,x=i=−kh′∑kh′i=−kw′∑kw′Wkh′+i,kw′+j⋅Iy+i,x+j ~~~~~~ 对于输入图像I的每个通道的坐标点位置 I x , y I_{x, y} Ix,y,都会有相同形态的滤波器(这里指kernel size,非每个滤波器本身的系数)对其进行vanilla conv操作。这对于classificaton和object detection任务是有意义的,因为每一个输入像素对于通过滑窗方式提取local feature都是有效的,即: vanilla conv对每个输入像素使用相同形态但系数不同的滤波器,可以有效地提取出图像的local feature。
~~~~~~ 然而,对于image inpainting任务,input feature由valid pixels outside holes、invalid pixels in the masked regions(这些像素通常指在shallow layers才有,因为随着层数变深,invalid pixel会逐渐变成valid pixel)或synthesized pixels in the masked regions (deep layers)组成,这会使训练产生ambiguity,导致测试阶段产生visual artifacts,比如color discrepancy,blurriness,obvious edge responses。
partial conv的mask-update应该是以滑窗为单元,更新策略应该类似于腐蚀操作,随着层数增加,mask的黑色条纹应该会越来越细,图3的示意图可以印证这个观点。更多细节就需要阅读源码。
~~~~~~ 为了让卷积只依赖于valid pixel,partial conv提出了masking和re-normalization策略,partial conv的公式如下: O y , x = { ∑ ∑ W ⋅ ( I ⊙ M s u m ( M ) ) ) i f s u m ( M ) > 0 0 o t h e r w i s e O_{y,x}=\left\{\begin{matrix} \sum\sum W\cdot \left ( I \odot \frac{M}{sum\left (M \right ))} \right ) & if sum\left (M \right ) > 0\\ 0 & otherwise \end{matrix}\right. Oy,x={∑∑W⋅(I⊙sum(M))M)0ifsum(M)>0otherwise
~~~~~~ 其中M是对应的binary mask,每次经过partial conv,mask-update遵循以下策略: m y , x ′ = { 1 i f s u m ( M ) > 0 0 o t h e r w i s e m_{y,x}^{'}= \left\{\begin{matrix} 1 & if sum\left (M \right ) > 0\\ 0 & otherwise \end{matrix}\right. my,x′={10ifsum(M)>0otherwise
~~~~~~ partial conv的确提升了inpainting任务在irregular masks上的表现,但仍有以下几点缺陷:
partial conv
可以被认为是hard-gating single-channel un-learnable layer
再跟input feature map逐像素点乘。
~~~~~~ gated conv抛弃通过固定规则进行更新的hard mask,而是从数据中自动学习soft mask,公式如下: G a t i n g y , x = ∑ ∑ W g ⋅ I Gating_{y,x} = \sum \sum W_{g} \cdot I Gatingy,x=∑∑Wg⋅I F e a t u r e y , x = ∑ ∑ W f ⋅ I Feature_{y,x} = \sum \sum W_{f} \cdot I Featurey,x=∑∑Wf⋅I O y , x = ϕ ( F e a t u r e y , x ) ⊙ σ ( G a t i n g y , x ) O_{y,x} = \phi \left ( Feature_{y,x} \right )\odot \sigma \left ( Gating_{y,x} \right ) Oy,x=ϕ(Featurey,x)⊙σ(Gatingy,x)
~~~~~~ 其中, σ \sigma σ表示对0~1的output gating values使用sigmoid激活函数, ϕ \phi ϕ可以是任意激活函数(ReLU or LeakyReLU)
~~~~~~ gated conv使得网络可以针对每个channel和每个空间位置,学习一种动态特征选择机制。有趣的是,图3: row-3的intermediate gating value可视化表明,网络不仅可以根据background、mask和sketch,也可以根据一些通道的semantic segmentation来选择feature maps。甚至在更深的层,gated conv可以在不同的channel对masked regions进行highlight ,也可以sketch必要的information,来获得更好的inpainting结果。
客观指标:mean L1 loss, mean L2 loss, mean TV loss
主观效果
实现参考:SC-FEGAN
"""Local Response Normalization.
The 4-D `input` tensor is treated as a 3-D array of 1-D vectors (along the last
dimension), and each vector is normalized independently. Within a given vector,
each component is divided by the weighted, squared sum of inputs within
`depth_radius`. In detail,
sqr_sum[a, b, c, d] =
sum(input[a, b, c, d - depth_radius : d + depth_radius + 1] ** 2)
output = input / (bias + alpha * sqr_sum) ** beta
a is batch size. d is channel.
~~~~~~ 公式如下
y ( X ) = ( X ∗ W + b ) ⊗ σ ( X ∗ V + c ) y({\bf{X}}) = ({\bf{X*W + b}}) \otimes \sigma ({\bf{X*V + c}}) y(X)=(X∗W+b)⊗σ(X∗V+c)
~~~~~~ 其中 W,V为两个不同的卷积核
~~~~~~ tensorflow实现方法如下
def gate_conv(x_in, cnum, ksize, stride=1, rate=1, name='conv',
padding='SAME', activation='leaky_relu', use_lrn=True, training=True):
assert padding in ['SYMMETRIC', 'SAME', 'REFELECT']
if padding == 'SYMMETRIC' or padding == 'REFELECT':
p = int(rate * (ksize - 1) / 2)
x = tf.pad(x_in, [[0, 0], [p, p], [p, p], [0, 0]], mode=padding)
padding = 'VALID'
x = tf.layers.conv2d(
x_in, cnum, ksize, stride, dilation_rate=rate,
activation=None, padding=padding, name=name)
if use_lrn:
x = tf.nn.lrn(x, bias=0.00005)
if activation == 'leaky_relu':
x = tf.nn.leaky_relu(x)
g = tf.layers.conv2d(
x_in, cnum, ksize, stride, dilation_rate=rate,
activation=tf.nn.sigmoid, padding=padding, name=name + '_g')
x = tf.multiply(x, g)
return x, g
def gate_deconv(input_, output_shape, k_h=5, k_w=5, d_h=2, d_w=2, stddev=0.02,
name="deconv", training=True):
with tf.variable_scope(name):
# filter : [height, width, output_channels, in_channels]
w = tf.get_variable('w', [k_h, k_w, output_shape[-1], input_.get_shape()[-1]],
initializer=tf.random_normal_initializer(stddev=stddev))
deconv = tf.nn.conv2d_transpose(input_, w, output_shape=output_shape,
strides=[1, d_h, d_w, 1])
biases = tf.get_variable('biases1', [output_shape[-1]], initializer=tf.constant_initializer(0.0))
deconv = tf.reshape(tf.nn.bias_add(deconv, biases), deconv.get_shape())
deconv = tf.nn.leaky_relu(deconv)
g = tf.nn.conv2d_transpose(input_, w, output_shape=output_shape,
strides=[1, d_h, d_w, 1])
b = tf.get_variable('biases2', [output_shape[-1]], initializer=tf.constant_initializer(0.0))
g = tf.reshape(tf.nn.bias_add(g, b), deconv.get_shape())
g = tf.nn.sigmoid(deconv)
deconv = tf.multiply(g,deconv)
return deconv, g