论文地址:https://arxiv.org/abs/1805.10180
Face++, 北理工, 北大近期联合发表的一篇关于语义分割的的金字塔注意力模型。
这个模型适用于2D网络,因为里面用到了Global Pooling, 这个操作不适合3D网络,所以Keras里面也没有相应的3D模块,只有1D和2D的GlobalAveragePooling, GlobalMaxPooling。而且在这两个中作者发现GlobalAveragePooling的效果更好。
这个模型主要由两部分组成:Feature Pyramid Attention(FPA)和 Global Attention Upsample(GAU)
其中FPA和deeplab里面的Spatial Pyramid Pooling很相似
全局注意力上采样模块 (Global Attention Upsample,GAU),对低层次特征执行 3×3 的卷积操作,以减少 CNN 特征图的通道数。从高层次特征生成的全局上下文信息依次经过 1×1 卷积、批量归一化 和非线性变换操作 ,然后再与低层次特征相乘。最后,高层次特征与加权后的低层次特征相加并进行逐步的上采样过程。
整体架构结合特征金字塔注意力模块 (FPA) 和全局注意力上采样模块 (GAU)
对这两个模块的作用作者做了总结:FPA 模块能够提供像素级注意力信息并通过金字塔结构来扩大感受野的范围。GAU 模块能够利用高层次特征图来指导低层次特征恢复图像像素的定位。
最后的实验结果表明,这篇论文所提出的方法在 PASCAL VOC 2012 语义分割任务实现了当前最佳的性能。
代码实现:
def Inception_dilation(self, inputs, f):
conv3 = Conv2D(f, (3, 3), padding='same', activation= 'selu', kernel_initializer = 'he_normal')(inputs)
conv5 = Conv2D(f, (3, 3), padding='same', dilation_rate = (2, 2), activation= 'selu', kernel_initializer = 'he_normal')(inputs)
conv7 = Conv2D(f, (3, 3), padding='same', dilation_rate = (4, 4), activation= 'selu', kernel_initializer = 'he_normal')(inputs)
conv9 = Conv2D(f, (3, 3), padding='same', dilation_rate = (6, 6), activation= 'selu', kernel_initializer = 'he_normal')(inputs)
merge2 = merge([conv3, conv5, conv7, conv9], mode='concat', concat_axis=3)
return merge2
def FeaturePyramidAttention(self, inputs, f):
#f:通道数量
conv1 = Conv2D(f, (1, 1), padding='same', activation= 'selu', kernel_initializer = 'he_normal')(inputs)
conv7 = Conv2D(f, (3, 3), padding='same', dilation_rate = (4, 4), activation= 'selu', kernel_initializer = 'he_normal')(inputs)
pool1 = MaxPooling2D(pool_size=(4, 4))(conv7)
# conv7 = Conv2D(f, (3, 3), padding='same', dilation_rate = (4, 4), activation= 'selu', kernel_initializer = 'he_normal')(conv7)
conv5 = Conv2D(f, (3, 3), padding='same', dilation_rate = (3, 3), activation= 'selu', kernel_initializer = 'he_normal')(pool1)
pool2 = MaxPooling2D(pool_size=(4, 4))(conv5)
# conv5 = Conv2D(f, (3, 3), padding='same', dilation_rate = (3, 3), activation= 'selu', kernel_initializer = 'he_normal')(conv5)
conv3 = Conv2D(f, (3, 3), padding='same', dilation_rate = (2, 2), activation= 'selu', kernel_initializer = 'he_normal')(pool2)
pool3 = MaxPooling2D(pool_size=(4, 4))(conv3)
conv2 = Conv2D(f, (3, 3), padding='same', activation= 'selu', kernel_initializer = 'he_normal')(pool3)
up1 = UpSampling2D(size=(4, 4))(conv2)
up1 = Conv2D(f, (1, 1), padding='same', activation= 'selu', kernel_initializer = 'he_normal')(up1)
up1 = merge([up1, conv3], mode='concat', concat_axis=3)
up2 = UpSampling2D(size=(4, 4))(up1)
up2 = Conv2D(f, (1, 1), padding='same', activation= 'selu', kernel_initializer = 'he_normal')(up2)
up2 = merge([up2, conv5], mode='concat', concat_axis=3)
up3 = UpSampling2D(size=(4, 4))(up2)
up3 = Conv2D(f, (1, 1), padding='same', activation= 'selu', kernel_initializer = 'he_normal')(up3)
up3 = merge([up3, conv7], mode='concat', concat_axis=3)
out = merge([up3, conv1], mode='concat', concat_axis=3)
return out
def GlobalAttentionUpsample(self, inputs_low, inputs_high, f):
#inputs_low:低层次信息输入
#inputs_high:高层次信息输入
print('inputs_high.shape---------',inputs_high.shape)
conv3 = Conv2D(f*3, (3, 3), padding='same', activation= 'selu', kernel_initializer = 'he_normal')(inputs_low)
gap = GlobalAveragePooling2D()(inputs_high)
print('gap.shape------------', gap.shape)
# conv1 = Conv2D(f*4, (1, 1), padding='same', activation= 'selu', kernel_initializer = 'he_normal')(gap)
conv1conv3 = Multiply()([gap, conv3])
out = merge([conv1conv3, inputs_high], mode='concat', concat_axis=3)
return out
def PAN(self):
inputs = Input((self.img_rows, self.img_cols,1))
conv1 = self.Inception_dilation(inputs, 4)
res1 = merge([inputs, conv1], mode='concat', concat_axis=3)
conv2 = self.Inception_dilation(res1, 4)
conv2 = self.Inception_dilation(conv2, 4)
res2 = merge([res1, conv2], mode='concat', concat_axis=3)
conv3 = self.Inception_dilation(res2, 4)
conv3 = self.Inception_dilation(conv3, 4)
res3 = merge([res2, conv3], mode='concat', concat_axis=3)
conv4 = self.Inception_dilation(res3, 4)
conv4 = self.Inception_dilation(conv4, 4)
#res4 = merge([res3, conv4], mode='concat', concat_axis=3)
FPA = self.FeaturePyramidAttention(conv4, 4)
print('FPA.shape', FPA.shape)
print('conv3.shape', conv3.shape)
GAU1 = self.GlobalAttentionUpsample(conv3, FPA, 4)
GF1 = merge([FPA, GAU1], mode='concat', concat_axis=3)
GAU2 = self.GlobalAttentionUpsample(conv2, GF1, 12)
GF2 = merge([GF1, GAU2], mode='concat', concat_axis=3)
GAU3 = self.GlobalAttentionUpsample(conv1, GF2, 36)
GF3 = merge([GF2, GAU3], mode='concat', concat_axis=3)
conv8 = Conv2D(4, (1, 1), padding='same', activation= 'selu', kernel_initializer = 'he_normal')(GF3)
# conv9 = Conv2D(2, (3, 3, 3), padding='same', activation= 'selu', kernel_initializer = 'he_normal')(conv8)
print("conv8 shape:", conv8.shape)
conv9 = Conv2D(1, 1, activation = 'sigmoid')(conv8)
print("conv9 shape:", conv9.shape)
model = Model(inputs=inputs, outputs=conv9)
# plot_model(model, to_file = 'model_3dxception.png', show_shapes = True)
parallel_model = multi_gpu_model(model, gpus=2)
parallel_model.compile(optimizer=Adam(lr=0.001), loss=self.dice_coef_loss, metrics=['accuracy'])
with open('seg_liver2D_pan.json', 'w') as files:
files.write(model.to_json())
return parallel_model
注:根据Keras最新版本,代码中merge操作建议改成concatenate
例如:
up2 = merge([up2, conv5], mode='concat', concat_axis=3)
#改为
up2 = concatenate([up2, conv5], axis=3)