mask rcnn 超详细代码解读(三)

前文链接:
mask rcnn 超详细代码解读(一)
mask rcnn 超详细代码解读(二)

文章目录

  • 1 各部分代码之间关系梳理
  • 2 继续代码解读
    • 2.1 Feature Pyramid Network Heads
    • 2.2 MaskRCNN Class

1 各部分代码之间关系梳理

目前已经在解析(一)完成 Resnet Graph、RPN、Proposal Layer 的代码解析,在解析(二)中完成 ROIAlign Layer、Detection Target Layer 的解析。接下来要解析Feature Pyramid Network Heads和MaskRCNN Class。这些模块之间的关系:
mask rcnn 超详细代码解读(三)_第1张图片

2 继续代码解读

2.1 Feature Pyramid Network Heads

卷积神经网络中,把预测分支成为“头”(也就是head)。这里 Feature Pyramid Network Heads 是输出预测的 class_id、bbox框的delta、mask的分支。包括两个方法:

  • def fpn_classifier_graph:输出分类结果和bbox回归的delta
  • def build_fpn_mask_graph:输出预测的mask

输入:

  • rois: [batch, num_rois, (y1, x1, y2, x2)] Proposal 坐标(归一化的)
  • feature_maps: 特征金字塔 [P2, P3, P4, P5].
  • image_meta: 图片的一些信息包括原始尺寸什么的
  • pool_size: ROI Pooling 后固定的特征大小,一般是 7x7 ,详细见分析(二)
  • num_classes: 类别的数量,决定了结果的 depth
  • train_bn: Boolean.控制参数可训练还是冻结
  • fc_layers_size: 两个全连接层的尺寸

输出:

  • logits: [batch, num_rois, NUM_CLASSES] classifier logits (before softmax)
  • probs: [batch, num_rois, NUM_CLASSES] classifier probabilities
  • bbox_deltas: [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))] Deltas to apply to proposal boxes

首先看 fpn_classifier_graph

网络结构图:
mask rcnn 超详细代码解读(三)_第2张图片
带注释的源码:

def fpn_classifier_graph(rois, feature_maps, image_meta,
                         pool_size, num_classes, train_bn=True,
                         fc_layers_size=1024):
    """Builds the computation graph of the feature pyramid network classifier
    and regressor heads.

    rois: [batch, num_rois, (y1, x1, y2, x2)] Proposal boxes in normalized
          coordinates.
    feature_maps: List of feature maps from different layers of the pyramid,
                  [P2, P3, P4, P5]. Each has a different resolution.
    image_meta: [batch, (meta data)] Image details. See compose_image_meta()
    pool_size: The width of the square feature map generated from ROI Pooling.
    num_classes: number of classes, which determines the depth of the results
    train_bn: Boolean. Train or freeze Batch Norm layers
    fc_layers_size: Size of the 2 FC layers

    Returns:
        logits: [batch, num_rois, NUM_CLASSES] classifier logits (before softmax)
        probs: [batch, num_rois, NUM_CLASSES] classifier probabilities
        bbox_deltas: [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))] Deltas to apply to
                     proposal boxes
    """
    # ROI Pooling
    # Shape: [batch, num_rois, POOL_SIZE, POOL_SIZE, channels]
    # 得到经过ROI pooling 之后的特征列表(这个x就是从每个层的特征图上提取出来的特征列表)
    x = PyramidROIAlign([pool_size, pool_size],
                        name="roi_align_classifier")([rois, image_meta] + feature_maps)
    # Two 1024 FC layers (implemented with Conv2D for consistency)
    # 将上面得到的特征列表送入 2 个 1024 通道数的卷积层以及 2 个 rulu 激活层
    # 每个ROI都在P2-P5中的某一层得到了一个特征,然后送入同一个分类和回归网络得到最终结果
    # FPN中每一层的heads参数都是共享的
    # TimeDistributed拆分了输入数据的第1维(从0开始),将完全一样的模型独立的应用于拆分后的输入数据,具体到下行,
    # 就是将num_rois个卷积应用到num_rois个维度为[batch, POOL_SIZE, POOL_SIZE, channels]的输入,结果合并
    # KL.TimeDistributed实现建立一系列同样架构的的并行网络结构(dim0个),将[dim0, dim1, ……]中的每个[dim1, ……]作为输入,并行的计算输出。
    # 这个时候x.shape的秩为5,用Conv2D会报错,所以用TimeDistributed对每个batch分开处理再合上。(TimeDistributed的用意在此,大师,我悟了。)
    # 第一个卷积上来就用一个和特征图大小相同的卷积核,直接把特征图大小变成1x1,方便之后squeeze掉,变成shared,让shared去接Dense实现分类和回归
    x = KL.TimeDistributed(KL.Conv2D(fc_layers_size, (pool_size, pool_size), padding="valid"),
                           name="mrcnn_class_conv1")(x)  # [batch, num_rois, 1, 1, 1024]
    x = KL.TimeDistributed(BatchNorm(), name='mrcnn_class_bn1')(x, training=train_bn)
    x = KL.Activation('relu')(x)
    x = KL.TimeDistributed(KL.Conv2D(fc_layers_size, (1, 1)),
                           name="mrcnn_class_conv2")(x)
    x = KL.TimeDistributed(BatchNorm(), name='mrcnn_class_bn2')(x, training=train_bn)
    x = KL.Activation('relu')(x)

    # 把两个pool_zise维度squeeze掉了
    # It is an error to squeeze a dimension that is not 1
    # shares现在的shape是[batch,num_rois,channels]
    shared = KL.Lambda(lambda x: K.squeeze(K.squeeze(x, 3), 2),
                       name="pool_squeeze")(x)

    # Classifier head
    # 分类层
    mrcnn_class_logits = KL.TimeDistributed(KL.Dense(num_classes),
                                            name='mrcnn_class_logits')(shared)
    mrcnn_probs = KL.TimeDistributed(KL.Activation("softmax"),
                                     name="mrcnn_class")(mrcnn_class_logits)

    # BBox head
    # [batch, num_rois, NUM_CLASSES * (dy, dx, log(dh), log(dw))]
    # BBox 的位置偏移回归层
    x = KL.TimeDistributed(KL.Dense(num_classes * 4, activation='linear'),
                           name='mrcnn_bbox_fc')(shared)
    # Reshape to [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))]
    s = K.int_shape(x)
    mrcnn_bbox = KL.Reshape((s[1], num_classes, 4), name="mrcnn_bbox")(x)

    return mrcnn_class_logits, mrcnn_probs, mrcnn_bbox

然后看另一个方法,输出的是 mask。网络结构图:
mask rcnn 超详细代码解读(三)_第3张图片
带注释的源码:

def build_fpn_mask_graph(rois, feature_maps, image_meta,
                         pool_size, num_classes, train_bn=True):
    """Builds the computation graph of the mask head of Feature Pyramid Network.

    rois: [batch, num_rois, (y1, x1, y2, x2)] Proposal boxes in normalized
          coordinates.
    feature_maps: List of feature maps from different layers of the pyramid,
                  [P2, P3, P4, P5]. Each has a different resolution.
    image_meta: [batch, (meta data)] Image details. See compose_image_meta()
    pool_size: The width of the square feature map generated from ROI Pooling.
    num_classes: number of classes, which determines the depth of the results
    train_bn: Boolean. Train or freeze Batch Norm layers

    Returns: Masks [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, NUM_CLASSES]
    """
    # ROI Pooling
    # Shape: [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, channels]
    x = PyramidROIAlign([pool_size, pool_size],
                        name="roi_align_mask")([rois, image_meta] + feature_maps)

    # Conv layers
    x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"),
                           name="mrcnn_mask_conv1")(x)
    x = KL.TimeDistributed(BatchNorm(),
                           name='mrcnn_mask_bn1')(x, training=train_bn)
    x = KL.Activation('relu')(x)

    x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"),
                           name="mrcnn_mask_conv2")(x)
    x = KL.TimeDistributed(BatchNorm(),
                           name='mrcnn_mask_bn2')(x, training=train_bn)
    x = KL.Activation('relu')(x)

    x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"),
                           name="mrcnn_mask_conv3")(x)
    x = KL.TimeDistributed(BatchNorm(),
                           name='mrcnn_mask_bn3')(x, training=train_bn)
    x = KL.Activation('relu')(x)

    x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"),
                           name="mrcnn_mask_conv4")(x)
    x = KL.TimeDistributed(BatchNorm(),
                           name='mrcnn_mask_bn4')(x, training=train_bn)
    x = KL.Activation('relu')(x)
    # 生成众多宽高等同输入ROI feat

    # relu接sigmoid,保证每个像素位置介于01之间
    x = KL.TimeDistributed(KL.Conv2DTranspose(256, (2, 2), strides=2, activation="relu"),
                           name="mrcnn_mask_deconv")(x)
    x = KL.TimeDistributed(KL.Conv2D(num_classes, (1, 1), strides=1, activation="sigmoid"),
                           name="mrcnn_mask")(x)
    return x  # 至此,推断网络的最后一个输出——对象级别原始Mask计算了出来。

2.2 MaskRCNN Class

耶耶耶,这是网络结构部分的最后一截代码了,MaskRCNN Class——把前面解析的结构都连在一起。再次把所有部分的关系梳理图贴在下面:
mask rcnn 超详细代码解读(三)_第4张图片
MaskRCNN Class 就是按上面的流程写的。

首先需要注意,MaskRCNN Class 是一个包装类!!它跟前边儿的 Proposal Layer 等这些继承了 keras.engine.Layer 的类不一样,它不继承任何类,它是 keras_model 的一个包装类,里面实现了让一个 keras_model 参与训练和推测的方法,包括:

  • def load_weights:载入已训练参数
  • def compile:compile模型
  • def set_trainable:设置参数可训练
  • def train:开始训练
  • def detect:预测

MaskRCNN Class 的 __init__ 方法:

class MaskRCNN():
    """Encapsulates the Mask RCNN model functionality.

    The actual Keras model is in the keras_model property.
    """
    '''
    整个MaskRCNN类初始化之后的第一个方法就是build网络用的,在mode参数为inference情况下,
    下面给出了正式建立特征提取网络之前的class内部前置代码
    整个程序需要外部输入的变量(inference模式)仅有三个,注意keras的习惯不同于placeholder,
    上面代码的shape没有包含batch,实际shape是下面的样式:
        input_image:输入图片,[batch, None, None, config.IMAGE_SHAPE[2]]
        input_image_meta:图片的信息(包含形状、预处理信息等,后面会介绍),[batch, config.IMAGE_META_SIZE]
        input_anchors:锚框,[batch, None, 4]
    '''

    def __init__(self, mode, config, model_dir):
        """
        mode: Either "training" or "inference"
        config: A Sub-class of the Config class
        model_dir: Directory to save training logs and trained weights
        """
        assert mode in ['training', 'inference']
        self.mode = mode
        self.config = config
        self.model_dir = model_dir
        self.set_log_dir()
        self.keras_model = self.build(mode=mode, config=config)

如果要训练模型,就先获得一个 MaskRCNN Class 的对象,这个对象在 init 的时候,就会调用:

self.keras_model = self.build(mode=mode, config=config)

来获得一个 keras_model ,这个 build() 方法就是咱们重点关注的方法。

结合上面整理各个部分关系的图,以及下面 build 代码的逻辑图来梳理,这部分代码非常清晰。

mask rcnn 超详细代码解读(三)_第5张图片
带注释的源码如下:

    def build(self, mode, config):
        """Build Mask R-CNN architecture.
            input_shape: The shape of the input image.
            mode: Either "training" or "inference". The inputs and
                outputs of the model differ accordingly.
        """
        assert mode in ['training', 'inference']

        # Image size must be dividable by 2 multiple times
        # 这里强制要求了图片裁剪后尺度为2^n,且n>=6,保证下采样后不产生小数
        h, w = config.IMAGE_SHAPE[:2]
        if h / 2**6 != int(h / 2**6) or w / 2**6 != int(w / 2**6):
            raise Exception("Image size must be dividable by 2 at least 6 times "
                            "to avoid fractions when downscaling and upscaling."
                            "For example, use 256, 320, 384, 448, 512, ... etc. ")

        # Inputs
        input_image = KL.Input(
            shape=[None, None, config.IMAGE_SHAPE[2]], name="input_image")
        input_image_meta = KL.Input(shape=[config.IMAGE_META_SIZE],
                                    name="input_image_meta")
        if mode == "training":
            # RPN GT
            input_rpn_match = KL.Input(
                shape=[None, 1], name="input_rpn_match", dtype=tf.int32)
            input_rpn_bbox = KL.Input(
                shape=[None, 4], name="input_rpn_bbox", dtype=tf.float32)

            # Detection GT (class IDs, bounding boxes, and masks)
            # 1. GT Class IDs (zero padded)
            input_gt_class_ids = KL.Input(
                shape=[None], name="input_gt_class_ids", dtype=tf.int32)
            # 2. GT Boxes in pixels (zero padded)
            # [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in image coordinates
            input_gt_boxes = KL.Input(
                shape=[None, 4], name="input_gt_boxes", dtype=tf.float32)
            # Normalize coordinates
            gt_boxes = KL.Lambda(lambda x: norm_boxes_graph(
                x, K.shape(input_image)[1:3]))(input_gt_boxes)
            # 3. GT Masks (zero padded)
            # [batch, height, width, MAX_GT_INSTANCES]
            if config.USE_MINI_MASK:
                input_gt_masks = KL.Input(
                    shape=[config.MINI_MASK_SHAPE[0],
                           config.MINI_MASK_SHAPE[1], None],
                    name="input_gt_masks", dtype=bool)
            else:
                input_gt_masks = KL.Input(
                    shape=[config.IMAGE_SHAPE[0], config.IMAGE_SHAPE[1], None],
                    name="input_gt_masks", dtype=bool)
        elif mode == "inference":
            # Anchors in normalized coordinates
            input_anchors = KL.Input(shape=[None, 4], name="input_anchors")

        # 接上面build函数代码,经由如下判断(inference中该参数是字符串"resnet101",所以进入else分支),建立ResNet网络图
        # Build the shared convolutional layers.
        # Bottom-up Layers
        # Returns a list of the last layers of each stage, 5 in total.
        # Don't create the thead (stage 5), so we pick the 4th item in the list.
        if callable(config.BACKBONE):
            _, C2, C3, C4, C5 = config.BACKBONE(input_image, stage5=True,
                                                train_bn=config.TRAIN_BN)
        else:
            _, C2, C3, C4, C5 = resnet_graph(input_image, config.BACKBONE,
                                             stage5=True, train_bn=config.TRAIN_BN)
        # FPN处理部分,和示意图对比平铺直叙
        # Top-down Layers
        # TODO: add assert to varify feature map sizes match what's in config
        P5 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c5p5')(C5)
        P4 = KL.Add(name="fpn_p4add")([
            KL.UpSampling2D(size=(2, 2), name="fpn_p5upsampled")(P5),
            KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c4p4')(C4)])
        P3 = KL.Add(name="fpn_p3add")([
            KL.UpSampling2D(size=(2, 2), name="fpn_p4upsampled")(P4),
            KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c3p3')(C3)])
        P2 = KL.Add(name="fpn_p2add")([
            KL.UpSampling2D(size=(2, 2), name="fpn_p3upsampled")(P3),
            KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c2p2')(C2)])
        # Attach 3x3 conv to all P layers to get the final feature maps.
        P2 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p2")(P2)
        P3 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p3")(P3)
        P4 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p4")(P4)
        P5 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p5")(P5)
        # P6 is used for the 5th anchor scale in RPN. Generated by
        # subsampling from P5 with stride of 2.
        P6 = KL.MaxPooling2D(pool_size=(1, 1), strides=2, name="fpn_p6")(P5)

        # Note that P6 is used in RPN, but not in the classifier heads.
        # 最后提取特征集合如下
        rpn_feature_maps = [P2, P3, P4, P5, P6]  # 对应图中实线输出,送入RPN网络分类/回归得到锚框前景/背景鉴定结果
        mrcnn_feature_maps = [P2, P3, P4, P5]  # 后面进行ROIAlign时的切割目标

        # Anchors
        if mode == "training":
            anchors = self.get_anchors(config.IMAGE_SHAPE)
            # Duplicate across the batch dimension because Keras requires it
            # TODO: can this be optimized to avoid duplicating the anchors?
            anchors = np.broadcast_to(anchors, (config.BATCH_SIZE,) + anchors.shape)
            # A hack to get around Keras's bad support for constants
            anchors = KL.Lambda(lambda x: tf.Variable(anchors), name="anchors")(input_image)
        else:
            anchors = input_anchors
    # =============== RPN锚框信息生成 ================
        # RPN Model,返回的是Keras的module对象,注意这个对象是可call的
        rpn = build_rpn_model(config.RPN_ANCHOR_STRIDE,  # 1,3,256
                              len(config.RPN_ANCHOR_RATIOS), config.TOP_DOWN_PYRAMID_SIZE)
        # 将获取的list形式的各层锚框信息进行拼接重组
        # Loop through pyramid layers
        layer_outputs = []  # list of lists
        for p in rpn_feature_maps:
            layer_outputs.append(rpn([p]))  # 保存各pyramid特征经过RPN之后的结果
        # Concatenate layer outputs
        # Convert from list of lists of level outputs to list of lists
        # of outputs across levels.
        # e.g. [[a1, b1, c1], [a2, b2, c2]] => [[a1, a2], [b1, b2], [c1, c2]]
        output_names = ["rpn_class_logits", "rpn_class", "rpn_bbox"]
        outputs = list(zip(*layer_outputs))  # 原来的返回值为[(logits2, class2, bbox2), (logits3, class3, bbox3), ……],
        # 首先将之转换为[[logits2,……6], [class2,……6], [bbox2,……6]]
        outputs = [KL.Concatenate(axis=1, name=n)(list(o))  # 然后将每个小list中的tensor按照第一维度(即anchors维度)拼接,得到三个tensor
                   for o, n in zip(outputs, output_names)]

        # 每个tensor表明batch中图片对应5个特征层的全部anchors的分类回归信息,即:[batch, anchors, 2分类结果 or (dy, dx, log(dh), log(dw))]
        rpn_class_logits, rpn_class, rpn_bbox = outputs

    # =============== proposal建议区生成 ================
        # Generate proposals
        # Proposals are [batch, N, (y1, x1, y2, x2)] in normalized coordinates
        # and zero padded.
        # POST_NMS_ROIS_INFERENCE = 1000
        # POST_NMS_ROIS_TRAINING = 2000
        # proposal_count是一个整数,用于指定生成proposal数目,不足时会生成坐标为[0,0,0,0]的空值进行补全
        proposal_count = config.POST_NMS_ROIS_TRAINING if mode == "training"\
            else config.POST_NMS_ROIS_INFERENCE  # config.POST_NMS_ROIS_TRAINING=2000, config.POST_NMS_ROIS_INFERENCE=1000
        # [IMAGES_PER_GPU, num_rois, (y1, x1, y2, x2)]
        # IMAGES_PER_GPU取代了batch,之后说的batch都是IMAGES_PER_GPU
        rpn_rois = ProposalLayer(
            proposal_count=proposal_count,
            nms_threshold=config.RPN_NMS_THRESHOLD,
            name="ROI",
            config=config)([rpn_class, rpn_bbox, anchors])

        if mode == "training":
            # Class ID mask to mark class IDs supported by the dataset the image
            # came from.
            active_class_ids = KL.Lambda(
                lambda x: parse_image_meta_graph(x)["active_class_ids"]
                )(input_image_meta)

            if not config.USE_RPN_ROIS:
                # Ignore predicted ROIs and use ROIs provided as an input.
                input_rois = KL.Input(shape=[config.POST_NMS_ROIS_TRAINING, 4],
                                      name="input_roi", dtype=np.int32)
                # Normalize coordinates
                target_rois = KL.Lambda(lambda x: norm_boxes_graph(
                    x, K.shape(input_image)[1:3]))(input_rois)
            else:
                target_rois = rpn_rois

            # Generate detection targets
            # Subsamples proposals and generates target outputs for training
            # Note that proposal class IDs, gt_boxes, and gt_masks are zero
            # padded. Equally, returned rois and targets are zero padded.
            rois, target_class_ids, target_bbox, target_mask =\
                DetectionTargetLayer(config, name="proposal_targets")([
                    target_rois, input_gt_class_ids, gt_boxes, input_gt_masks])

            # Network Heads
            # TODO: verify that this handles zero padded ROIs
            mrcnn_class_logits, mrcnn_class, mrcnn_bbox =\
                fpn_classifier_graph(rois, mrcnn_feature_maps, input_image_meta,
                                     config.POOL_SIZE, config.NUM_CLASSES,
                                     train_bn=config.TRAIN_BN,
                                     fc_layers_size=config.FPN_CLASSIF_FC_LAYERS_SIZE)

            mrcnn_mask = build_fpn_mask_graph(rois, mrcnn_feature_maps,
                                              input_image_meta,
                                              config.MASK_POOL_SIZE,
                                              config.NUM_CLASSES,
                                              train_bn=config.TRAIN_BN)

            # TODO: clean up (use tf.identify if necessary)
            output_rois = KL.Lambda(lambda x: x * 1, name="output_rois")(rois)

            # Losses
            rpn_class_loss = KL.Lambda(lambda x: rpn_class_loss_graph(*x), name="rpn_class_loss")(
                [input_rpn_match, rpn_class_logits])
            rpn_bbox_loss = KL.Lambda(lambda x: rpn_bbox_loss_graph(config, *x), name="rpn_bbox_loss")(
                [input_rpn_bbox, input_rpn_match, rpn_bbox])
            class_loss = KL.Lambda(lambda x: mrcnn_class_loss_graph(*x), name="mrcnn_class_loss")(
                [target_class_ids, mrcnn_class_logits, active_class_ids])
            bbox_loss = KL.Lambda(lambda x: mrcnn_bbox_loss_graph(*x), name="mrcnn_bbox_loss")(
                [target_bbox, target_class_ids, mrcnn_bbox])
            mask_loss = KL.Lambda(lambda x: mrcnn_mask_loss_graph(*x), name="mrcnn_mask_loss")(
                [target_mask, target_class_ids, mrcnn_mask])

            # Model
            inputs = [input_image, input_image_meta,
                      input_rpn_match, input_rpn_bbox, input_gt_class_ids, input_gt_boxes, input_gt_masks]
            if not config.USE_RPN_ROIS:
                inputs.append(input_rois)
            outputs = [rpn_class_logits, rpn_class, rpn_bbox,
                       mrcnn_class_logits, mrcnn_class, mrcnn_bbox, mrcnn_mask,
                       rpn_rois, output_rois,
                       rpn_class_loss, rpn_bbox_loss, class_loss, bbox_loss, mask_loss]
            model = KM.Model(inputs, outputs, name='mask_rcnn')
        else:
            # Network Heads
            # Proposal classifier and BBox regressor heads
            # output shapes:
            #     mrcnn_class_logits: [batch, num_rois, NUM_CLASSES] classifier logits (before softmax)
            #     mrcnn_class: [batch, num_rois, NUM_CLASSES] classifier probabilities
            #     mrcnn_bbox(deltas): [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))]
            mrcnn_class_logits, mrcnn_class, mrcnn_bbox =\
                fpn_classifier_graph(rpn_rois, mrcnn_feature_maps, input_image_meta,
                                     config.POOL_SIZE, config.NUM_CLASSES,
                                     train_bn=config.TRAIN_BN,
                                     fc_layers_size=config.FPN_CLASSIF_FC_LAYERS_SIZE)

            # Detections
            # output is [batch, num_detections, (y1, x1, y2, x2, class_id, score)] in
            # normalized coordinates
            detections = DetectionLayer(config, name="mrcnn_detection")(
                [rpn_rois, mrcnn_class, mrcnn_bbox, input_image_meta])  # input_image_meta项记录了每一张图片的原始信息,用于提取输入图片长宽,进行金字塔ROI处理,即PyramidROIAlign

            # Create masks for detections
            detection_boxes = KL.Lambda(lambda x: x[..., :4])(detections)
            mrcnn_mask = build_fpn_mask_graph(detection_boxes, mrcnn_feature_maps,
                                              input_image_meta,
                                              config.MASK_POOL_SIZE,
                                              config.NUM_CLASSES,
                                              train_bn=config.TRAIN_BN)

            model = KM.Model([input_image, input_image_meta, input_anchors],
                             [detections, mrcnn_class, mrcnn_bbox,
                                 mrcnn_mask, rpn_rois, rpn_class, rpn_bbox],
                             name='mask_rcnn')

        # Add multi-GPU support.
        if config.GPU_COUNT > 1:
            from mrcnn.parallel_model import ParallelModel
            model = ParallelModel(model, config.GPU_COUNT)

        return model

其余的部分都比较套路,别的模型也是差不多的写法,就不多说了。

啊~~~~~~~~到这里,train 网络的结构代码就算是解析完毕啦,终于完成了一小部分了耶。

你可能感兴趣的:(mask,r-cnn代码解读,卷积神经网络,深度学习,python,tensorflow,计算机视觉)