Mask-RCNN源码阅读笔记

阅读了https://blog.csdn.net/u011974639/article/details/78483779?locationNum=9&fps=1这篇博客

这篇博客介绍了几个ipynb格式的代码,但没有其他python文件(包括coco源码)解析;

这些天研读了一下那些源码,有错误,忘大神指正批评。~~~~~~~~

一直存在草稿箱里没有发。。。

######################### 分隔符 ###########################################

读coco.py笔记:

coco 提供了图片,和一个图片可能的(多个)标注(annotations),coco源码里简称为ann

一个image对应一个img_id;

一个image_id可以有多个 annotations代码简称为anns;

每个ann对应有它的类别category;

多个种类源码中用cats;

所以一个图片或者说一个img_id, 就对应了多个anns,和多个cats;一个图片就对应了多种类别的mask;

mask: [instance_number,(y1,x1,y2,x2)]

anchors: [anchor_count, (y1,x1,y2,x2)]

所以生成的结果是:

Mask-RCNN源码阅读笔记_第1张图片

Mask-RCNN源码阅读笔记_第2张图片

截自博客https://blog.csdn.net/u011974639/article/details/78483779?locationNum=9&fps=1

RPN_ANCHOR_SCALES = (32,64,128,256,512)

1.Create model in training mode:

创建模型,就相当于创建一个骨架放在那里,此时还没有往里面传实际的数据

model = modellib.MaskRCNN(mode="training", config=config,model_dir=MODEL_DIR)

分析class MaskRCNN():

1.a Inputs

(1)使用keras.Layer.Input()得到input_image和input_image_meta,创建输入层的骨架

(2) RPN GT

    使用keras.Layer.Input()得到input_rpn_match [None,1],input_rpn_bbox [None,4]

(3)Detection GT (class IDs, bounding boxes, and masks)

    使用keras.Layer.Input()得到

   # 1. GT Class IDs (zero padded) :  input_gt_class_ids [None],

    # 2. GT Boxes in pixels (zero padded)
    # [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in image coordinates : input_gt_boxes

    # Normalize coordinates

    # 3. GT Masks (zero padded)      使用keras.Layer.Input()得到

    # [batch, height, width, MAX_GT_INSTANCES]

1.b Build the shared convolutional layers

FPN的结构最后得到

rpn_feature_maps = [P2, P3, P4, P5, P6] 

mrcnn_feature_maps = [P2, P3, P4, P5]   depth都是256

# Generate Anchors 

"""Generate anchors at different levels of a feature pyramid. Each scale
    is associated with a level of the pyramid, but each ratio is used in
    all levels of the pyramid.

    Returns:
    anchors: [N, (y1, x1, y2, x2)]. All generated anchors in one array. Sorted
        with the same order of the given scales. So, anchors of scale[0] come
        first, then anchors of scale[1], and so on.
    """

   anchors是根据config配置的 RPN_ANCHOR_SCALES= (32, 64, 128, 256, 512) # Length of square anchor side in pixels     的每个scale遍历

最终得到所有的像素pixel对应的所有的anchors(一个像素3个anchors),这和之后的RPN得到的是对应的,从而能够根据RPN在根据ProposalLayer里面得到的Indice得到对应indice的anchors(和后面介绍结合就明白了),这样就过滤了anchors。

1.c RPN Model

rpn = build_rpn_model(config.RPN_ANCHOR_STRIDE,

                              len(config.RPN_ANCHOR_RATIOS), 256)

ayer_outputs = []  # list of lists
        for p in rpn_feature_maps:
            layer_outputs.append(rpn([p]))

"""Builds a Keras model of the Region Proposal Network.
    It wraps the RPN graph so it can be used multiple times with shared
    weights.

    anchors_per_location: number of anchors per pixel in the feature map
    anchor_stride: Controls the density of anchors. Typically 1 (anchors for
                   every pixel in the feature map), or 2 (every other pixel).
    depth: Depth of the backbone feature map.

    Returns a Keras Model object. The model outputs, when called, are:
    rpn_logits: [batch, H, W, 2] Anchor classifier logits (before softmax)
    rpn_probs: [batch, W, W, 2] Anchor classifier probabilities.               rpn_probs就是每个anchor的score(2是前景和背景),
    rpn_bbox: [batch, H, W, (dy, dx, log(dh), log(dw))] Deltas to be
                applied to anchors.

    """

depth是传入的 1.b 获得的feature map的depth都是256, 其实最后返回的 rpn_probs(Anchor Score) 的 shape应该是 [batch, anchors, 2],由[batch, height, width, anchors per location * 2] reshape后得来。

rpn_box(Bounding box refinement.) [batch, H, W, anchors per location, depth]

    # where depth is [x, y, log(w), log(h)]    ,Reshape to [batch, anchors, 4]

这就和anchors的形状对应上了~

将每个 1.b 得到的 feature  map 送进模型 RPN model,得到每层的feature_map的 rpn_logits,rpn_probs,rpn_bbox,然后把所有层相对应的rpn_logits连在一起,另两个一样。

1.d Generate proposals
       # Proposals are [batch, N, (y1, x1, y2, x2)] in normalized coordinates
        # and zero padded.

函数:
          rpn_rois = ProposalLayer(proposal_count=proposal_count,
                                 nms_threshold=config.RPN_NMS_THRESHOLD,
                                 name="ROI",
                                 anchors=self.anchors,
                                 config=config)([rpn_class, rpn_bbox])

ProposalLayer

"""Receives anchor scores and selects a subset to pass as proposals
    to the second stage. Filtering is done based on anchor scores and
    non-max suppression to remove overlaps. It also applies bounding
    box refinement deltas to anchors.

    Inputs:
        rpn_probs: [batch, anchors, (bg prob, fg prob)]
        rpn_bbox: [batch, anchors, (dy, dx, log(dh), log(dw))]

    Returns:
        Proposals in normalized coordinates [batch, rois, (y1, x1, y2, x2)]
    """

anchors就是网络中后5个阶段stages的所有anchors;

1.d.1 ProposalLayer类里的call函数

(1)call函数接收inputs,就是上面的 rpn_probs,rpn_bbox然后把rpn_probs里的前景作为scores,取前K个scores大的anchors的下标(indices),然后根据这个indices,筛选出对应的scores,deltas,和anchors.

(2)# Apply deltas to anchors to get refined anchors. 对anchors根据deltas (dy,dx,dh,dw)进行调整

        # 返回调整后的boxes :  [batch, N, (y1, x1, y2, x2)]    相当于是新的anchors,因为更接近GT(ground truth)所以取名为boxes

(3)# Clip to image boundaries. [batch, N, (y1, x1, y2, x2)]  相当于把Boxes的边框限制在images的边界内

(4) Filter out small boxes 

    4.a Normalize dimensions to range of 0 to 1.   --->normalized_boxes

    4.b Non-max suppression 传入normalized_boxes和scores,进行NMS处理得到下标indices,然后根据indices筛选出对应的normalized_boxes,得到proposals(rpn_rois).     # Pad if needed

1.e Generate detection targets

# Subsamples proposals and generates target outputs for training
# Note that proposal class IDs, gt_boxes, and gt_masks are zero

# padded. Equally, returned rois and targets are zero padded.

class DetectionTargetLayer(KE.Layer):

"""Subsamples proposals and generates target box refinement, class_ids,
    and masks for each.

    Inputs:
    proposals: [batch, N, (y1, x1, y2, x2)] in normalized coordinates. Might
               be zero padded if there are not enough proposals.
    gt_class_ids: [batch, MAX_GT_INSTANCES] Integer class IDs.
    gt_boxes: [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in normalized
              coordinates.
    gt_masks: [batch, height, width, MAX_GT_INSTANCES] of boolean type

    Returns: Target ROIs and corresponding class IDs, bounding box shifts,
    and masks.
    rois: [batch, TRAIN_ROIS_PER_IMAGE, (y1, x1, y2, x2)] in normalized
          coordinates
    target_class_ids: [batch, TRAIN_ROIS_PER_IMAGE]. Integer class IDs.
    target_deltas: [batch, TRAIN_ROIS_PER_IMAGE, NUM_CLASSES,
                    (dy, dx, log(dh), log(dw), class_id)]
                   Class-specific bbox refinements.
    target_mask: [batch, TRAIN_ROIS_PER_IMAGE, height, width)
                 Masks cropped to bbox boundaries and resized to neural
                 network output size.

    Note: Returned arrays might be zero padded if not enough target ROIs.

    """

# Compute overlaps matrix [proposals, gt_boxes]
    overlaps = overlaps_graph(proposals, gt_boxes)    得到每个proposal对应于所有MAX_GT_INSTANCES数目的GT的IOU

    tf.reduce_max(overlaps, axis=1)    计算每个proposal的最大值 (每行的最大值)

返回的rois等包括positive的proposals和negative的proposals和 0 padded

1.f Network Heads

1.f.1 fpn_classifier_graph()

进行分类和bbox回归

"""Builds the computation graph of the feature pyramid network classifier
    and regressor heads.


    rois: [batch, num_rois, (y1, x1, y2, x2)] Proposal boxes in normalized
          coordinates.
    feature_maps: List of feature maps from diffent layers of the pyramid,
                  [P2, P3, P4, P5]. Each has a different resolution.
    image_shape: [height, width, depth]
    pool_size: The width of the square feature map generated from ROI Pooling.
    num_classes: number of classes, which determines the depth of the results
    train_bn: Boolean. Train or freeze Batch Norm layres


    Returns:
        logits: [N, NUM_CLASSES] classifier logits (before softmax)
        probs: [N, NUM_CLASSES] classifier probabilities
        bbox_deltas: [N, (dy, dx, log(dh), log(dw))] Deltas to apply to
                     proposal boxes

    """

1.f.1.1 PyramidROIAlign

"""Implements ROI Pooling on multiple levels of the feature pyramid.

    Params:
    - pool_shape: [height, width] of the output pooled regions. Usually [7, 7]
    - image_shape: [height, width, channels]. Shape of input image in pixels


    Inputs:
    - boxes: [batch, num_boxes, (y1, x1, y2, x2)] in normalized
             coordinates. Possibly padded with zeros if not enough
             boxes to fill the array.
    - Feature maps: List of feature maps from different levels of the pyramid.
                    Each is [batch, height, width, channels]


    Output:
    Pooled regions in the shape: [batch, num_boxes, height, width, channels].
    The width and height are those specific in the pool_shape in the layer
    constructor.
    """

ROIAlign 的原理: http://blog.leanote.com/post/[email protected]/b5f4f526490b

1.f.2 build_fpn_mask_graph()

"""Builds the computation graph of the mask head of Feature Pyramid Network.

    rois: [batch, num_rois, (y1, x1, y2, x2)] Proposal boxes in normalized
          coordinates.
    feature_maps: List of feature maps from diffent layers of the pyramid,
                  [P2, P3, P4, P5]. Each has a different resolution.
    image_shape: [height, width, depth]
    pool_size: The width of the square feature map generated from ROI Pooling.
    num_classes: number of classes, which determines the depth of the results
    train_bn: Boolean. Train or freeze Batch Norm layres


    Returns: Masks [batch, roi_count, height, width, num_classes]
    """
最后返回model 

调用model.train进行模型的训练(此时将dataset传入)


utils.py文件里:

    np.meshgrid()将一个一维数组变成二维矩阵 https://www.cnblogs.com/sunshinewang/p/6897966.html

config.py文件里:

    BACKBONE_SHAPES 就是 feature map shape :[256 256] [128 128] [64 64] [32 32] [16 16]...


    根据scores选前K个anchors, 然后refine剩下的anchors,然后将边框限定在image的边界,nomorlize之后进行NMS获得最终的proposals。


整个流程:

1.RPN

1.a RPN Targets

The RPN targets are the training values for the RPN. To generate the targets, we start with a grid of anchors that cover the full image at different scales, and then we compute the IoU of the anchors with ground truth object. Positive anchors are those that have an IoU >= 0.7 with any ground truth object, and negative anchors are those that don't cover any object by more than 0.3 IoU. Anchors in between (i.e. cover an object by IoU >= 0.3 but < 0.7) are considered neutral and excluded from training.

To train the RPN regressor, we also compute the shift and resizing needed to make the anchor cover the ground truth object completely.

# Generate RPN trainig targets
# target_rpn_match is 1 for positive anchors, -1 for negative anchors
# and 0 for neutral anchors.
target_rpn_match, target_rpn_bbox = modellib. build_rpn_targets(

    image.shape, model.anchors, gt_class_id, gt_bbox, model.config)

#### target_rpn_bbox最多是在config里面指定的,实际上存的有数据数量的是positive_anchors的数目,存的数据内容实际上是(dy, dx, log(dh), log(dw))【这个是通过positive_anchors和对应的GT_bboxs计算得到】。

然后调用utils.apply_box_deltas()函数对positive_anchors根据dy, dx, log(dh), log(dw) 进行微调

Mask-RCNN源码阅读笔记_第3张图片

                    实线是微调后的,虚线是微调前的



复习了一下Faster-RCNN

https://blog.csdn.net/u013832707/article/details/53641055/

对于窗口(proposal)是用的是,中心点和长宽标记 P = (px, py, pw, ph), 其实真正的P是这个窗口对应的CNN特征

边框回归:(1)先做平移;(2)后做尺度缩放;所以需要学习四个变量:dx(P),dy(P), dw(P), dh(P);


你可能感兴趣的:(深度学习-语义分割)