mrcnn_mask = build_fpn_mask_graph(rois, mrcnn_feature_maps,
input_image_meta,
config.MASK_POOL_SIZE,
config.NUM_CLASSES,
train_bn=config.TRAIN_BN)
输入参数
rois:(1,200,4)这里的200包含有3部分,正样本、负样本和填补的0;
feature_maps:mrcnn_feature_maps = [P2, P3, P4, P5]
image_meta:请查看https://blog.csdn.net/u013066730/article/details/102501128,主要就是记录图片从原图像变化到输入网络的图像的各种变化参数;
pool_size:14;
num_classes:81;
train_bn:False。
def build_fpn_mask_graph(rois, feature_maps, image_meta,
pool_size, num_classes, train_bn=True):
"""Builds the computation graph of the mask head of Feature Pyramid Network.
rois: [batch, num_rois, (y1, x1, y2, x2)] Proposal boxes in normalized
coordinates.
feature_maps: List of feature maps from different layers of the pyramid,
[P2, P3, P4, P5]. Each has a different resolution.
image_meta: [batch, (meta data)] Image details. See compose_image_meta()
pool_size: The width of the square feature map generated from ROI Pooling.
num_classes: number of classes, which determines the depth of the results
train_bn: Boolean. Train or freeze Batch Norm layers
Returns: Masks [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, NUM_CLASSES]
"""
# ROI Pooling
# Shape: [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, channels]
x = PyramidROIAlign([pool_size, pool_size],
name="roi_align_mask")([rois, image_meta] + feature_maps)
# Conv layers
x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"),
name="mrcnn_mask_conv1")(x)
x = KL.TimeDistributed(BatchNorm(),
name='mrcnn_mask_bn1')(x, training=train_bn)
x = KL.Activation('relu')(x)
x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"),
name="mrcnn_mask_conv2")(x)
x = KL.TimeDistributed(BatchNorm(),
name='mrcnn_mask_bn2')(x, training=train_bn)
x = KL.Activation('relu')(x)
x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"),
name="mrcnn_mask_conv3")(x)
x = KL.TimeDistributed(BatchNorm(),
name='mrcnn_mask_bn3')(x, training=train_bn)
x = KL.Activation('relu')(x)
x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"),
name="mrcnn_mask_conv4")(x)
x = KL.TimeDistributed(BatchNorm(),
name='mrcnn_mask_bn4')(x, training=train_bn)
x = KL.Activation('relu')(x)
x = KL.TimeDistributed(KL.Conv2DTranspose(256, (2, 2), strides=2, activation="relu"),
name="mrcnn_mask_deconv")(x)
x = KL.TimeDistributed(KL.Conv2D(num_classes, (1, 1), strides=1, activation="sigmoid"),
name="mrcnn_mask")(x)
return x
中间涉及pyramidroialign和timedistributed,可以参考https://blog.csdn.net/u013066730/article/details/102664978#PyramidROIAlign
https://blog.csdn.net/u013066730/article/details/102664978#TimeDistributed
input_shape | output_shape | |
PyramidROIAlign | [rois, image_meta] + feature_maps | (1,200,14,14,256) |
TimeDistributed-Conv2D(k=3,same,i=256,o=256) | (1,200,14,14,256) | (1,200,14,14,256) |
TimeDistributed-BN | (1,200,14,14,256) | (1,200,14,14,256) |
Relu | (1,200,14,14,256) | (1,200,14,14,256) |
TimeDistributed-Conv2D(k=3,same,i=256,o=256) | (1,200,14,14,256) | (1,200,14,14,256) |
TimeDistributed-BN | (1,200,14,14,256) | (1,200,14,14,256) |
Relu | (1,200,14,14,256) | (1,200,14,14,256) |
TimeDistributed-Conv2D(k=3,same,i=256,o=256) | (1,200,14,14,256) | (1,200,14,14,256) |
TimeDistributed-BN | (1,200,14,14,256) | (1,200,14,14,256) |
Relu | (1,200,14,14,256) | (1,200,14,14,256) |
TimeDistributed-Conv2D(k=3,same,i=256,o=256) | (1,200,14,14,256) | (1,200,14,14,256) |
TimeDistributed-BN | (1,200,14,14,256) | (1,200,14,14,256) |
Relu | (1,200,14,14,256) | (1,200,14,14,256) |
TimeDistributed-Conv2DTranspose(k=2,s=2,i=256,o=256,relu) | (1,200,14,14,256) | (1,200,28,28,256) |
TimeDistributed-Conv2D(k=1,s=1,i=256,o=81,sigmoid) | (1,200,28,28,256) | (1,200,28,28,81) |
最终返回的mrcnn_mask的形状为(1,200,28,28,81)。