MaskRCNN源码解读

文章目录

    • 1.模型
      • 1.1 在build方法中,首先创建一些模型的输入变量。
      • 1.2 backbone网络
      • 1.3 RPN网络
      • 1.4 ProposalLayer
      • 1.5 DetectionTargetLayer
      • 1.6 Feature Pyramid Network Heads(fpn_classifier_graph)
      • 1.7 build_fpn_mask_graph
      • 1.8损失函数
      • 1.9 总的模型
    • 2 模型的训练
    • 3 模型的inference

源码地址https://github.com/matterport/Mask_RCNN(基于Keras)
整体流程:
MaskRCNN源码解读_第1张图片

1.模型

1.1 在build方法中,首先创建一些模型的输入变量。

由于此时我们并不知道输入变量的具体大小,因此我们可以使用Keras的变量进行占位操作,等到具体训练的时候根据传入的参数确定变量的大小。train和inference的时候,都是先build的模型,因此他们共有一个模型,inference的时候,需要的输入变量比较少。gt_boxes使用了归一化的操作,也就是所有的边框的坐标都是0到1之间,这样做的好处是避免数值大小带来的预测误差。

# Inputs:图片、图片信息
input_image = KL.Input(shape=[None, None, config.IMAGE_SHAPE[2]], name="input_image")
input_image_meta = KL.Input(shape=[config.IMAGE_META_SIZE],name="input_image_meta")
if mode == "training":
    # RPN GT 类别和box标签
    input_rpn_match = KL.Input(
        shape=[None, 1], name="input_rpn_match", dtype=tf.int32)
    input_rpn_bbox = KL.Input(
        shape=[None, 4], name="input_rpn_bbox", dtype=tf.float32)

    # Detection GT (类别、box标签、掩膜标签)
    # 1. GT Class IDs (zero padded)
    input_gt_class_ids = KL.Input(
        shape=[None], name="input_gt_class_ids", dtype=tf.int32)
    # 2. GT Boxes in pixels (zero padded)
    # [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in image coordinates
    input_gt_boxes = KL.Input(
        shape=[None, 4], name="input_gt_boxes", dtype=tf.float32)
    # Normalize coordinates  坐标归一化
    gt_boxes = KL.Lambda(lambda x: norm_boxes_graph(
        x, K.shape(input_image)[1:3]))(input_gt_boxes)
    # 3. GT Masks (zero padded)
    # [batch, height, width, MAX_GT_INSTANCES]
    if config.USE_MINI_MASK:
        input_gt_masks = KL.Input(
            shape=[config.MINI_MASK_SHAPE[0],
                   config.MINI_MASK_SHAPE[1], None],
            name="input_gt_masks", dtype=bool)
    else:
        input_gt_masks = KL.Input(
            shape=[config.IMAGE_SHAPE[0], config.IMAGE_SHAPE[1], None],
            name="input_gt_masks", dtype=bool)
elif mode == "inference":
    # Anchors in normalized coordinates
    input_anchors = KL.Input(shape=[None, 4], name="input_anchors")

1.2 backbone网络

搭建特征提取网络用于图片的特征的提取,maskrcnn使用了金字塔(FPN)的网络方式进行特征的提取,使用的网络是resnet101 。然后利用不同的特征去做不同的事情,代码中的使用方式如下,P2,P3,P4,P5,P6用于rpn网络提取信息。由于上面都进行了3*3的卷积操作,保证了不同层的特征的channel数目一样。

if callable(config.BACKBONE):
	_, C2, C3, C4, C5 = config.BACKBONE(input_image, stage5=True,train_bn=config.TRAIN_BN)
else:
	_, C2, C3, C4, C5 = resnet_graph(input_image, config.BACKBONE, stage5=True, train_bn=config.TRAIN_BN)
	
P5 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c5p5')(C5)  # TOP_DOWN_PYRAMID_SIZE=256
P4 = KL.Add(name="fpn_p4add")([
    KL.UpSampling2D(size=(2, 2), name="fpn_p5upsampled")(P5),
    KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c4p4')(C4)])
P3 = KL.Add(name="fpn_p3add")([
    KL.UpSampling2D(size=(2, 2), name="fpn_p4upsampled")(P4),
    KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c3p3')(C3)])
P2 = KL.Add(name="fpn_p2add")([
    KL.UpSampling2D(size=(2, 2), name="fpn_p3upsampled")(P3),
    KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c2p2')(C2)])
# Attach 3x3 conv to all P layers to get the final feature maps.
P2 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p2")(P2)
P3 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p3")(P3)
P4 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p4")(P4)
P5 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p5")(P5)
# P6 is used for the 5th anchor scale in RPN. P6通过P5下采样得到
P6 = KL.MaxPooling2D(pool_size=(1, 1), strides=2, name="fpn_p6")(P5)
# P6 在RPN使用了, 但是classifier heads没使用.
rpn_feature_maps = [P2, P3, P4, P5, P6]
mrcnn_feature_maps = [P2, P3, P4, P5]

下面利用这些特征图生成anchor:
分别在特征金字塔的feature上生成对应大小的anchor_scale,每层特征的每个像素都要生成3中尺度的anchor,如图计算总的anchor数是261888,总的数值是1047552个。这些坐标都对应到了原图(1024,1024)的大小,最后归一化到(0,1)。
MaskRCNN源码解读_第2张图片

# Anchors
if mode == "training":
    anchors = self.get_anchors(config.IMAGE_SHAPE)
    # Duplicate across the batch dimension because Keras requires it
    # TODO: can this be optimized to avoid duplicating the anchors?
    anchors = np.broadcast_to(anchors, (config.BATCH_SIZE,) + anchors.shape)
    # A hack to get around Keras's bad support for constants
    anchors = KL.Lambda(lambda x: tf.Variable(anchors), name="anchors")(input_image)
else:
    anchors = input_anchors

1.3 RPN网络

# RPN Model
rpn = build_rpn_model(config.RPN_ANCHOR_STRIDE,
                      len(config.RPN_ANCHOR_RATIOS), config.TOP_DOWN_PYRAMID_SIZE)
# 这里对前面特征金字塔的每层都输入到RPN网络,求得对应像素的分类和anchor偏移量
layer_outputs = []  # list of lists
for p in rpn_feature_maps:
    layer_outputs.append(rpn([p]))
# Concatenate layer outputs
# Convert from list of lists of level outputs to list of lists of outputs across levels.
# e.g. [[a1, b1, c1], [a2, b2, c2]] => [[a1, a2], [b1, b2], [c1, c2]]
output_names = ["rpn_class_logits", "rpn_class", "rpn_bbox"]
outputs = list(zip(*layer_outputs))
outputs = [KL.Concatenate(axis=1, name=n)(list(o)) for o, n in zip(outputs, output_names)]

rpn_class_logits, rpn_class, rpn_bbox = outputs

这是RPN网络输出的维度,相当于每个像素都预测了3种尺度anchor的分类和偏移量。

rpn_class_logits:   [batch, H * W * anchors_per_location, 2]  Anchor classifier logits (before softmax)
rpn_probs:          [batch, H * W * anchors_per_location, 2]  Anchor classifier probabilities.
rpn_bbox:           [batch, H * W * anchors_per_location, (dy, dx, log(dh), log(dw))]

这里是RPN网络的结构定义:

def rpn_graph(feature_map, anchors_per_location, anchor_stride):
    """Builds the computation graph of Region Proposal Network.

    feature_map:            backbone features [batch, height, width, depth]
    anchors_per_location:   number of anchors per pixel in the feature map. =3
    anchor_stride:          Controls the density of anchors. =1

    Returns:
        rpn_class_logits:   [batch, H * W * anchors_per_location, 2]  
        rpn_probs:          [batch, H * W * anchors_per_location, 2] 
        rpn_bbox:  			[batch, H * W * anchors_per_location, (dy, dx, log(dh), log(dw))]  
    """
    # Shared convolutional base of the RPN
    shared = KL.Conv2D(512, (3, 3), padding='same', activation='relu', strides=anchor_stride, name='rpn_conv_shared')(feature_map)
    # Anchor Score. [batch, height, width, anchors per location * 2].
    x = KL.Conv2D(2 * anchors_per_location, (1, 1), padding='valid', activation='linear', name='rpn_class_raw')(shared)
    # Reshape to [batch, anchors, 2]
    rpn_class_logits = KL.Lambda( lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 2]))(x)
    # Softmax on last dimension of BG/FG.
    rpn_probs = KL.Activation( "softmax", name="rpn_class_xxx")(rpn_class_logits)
    # Bounding box refinement. [batch, H, W, anchors per location * depth]
    # where depth is [x, y, log(w), log(h)]
    x = KL.Conv2D(anchors_per_location * 4, (1, 1), padding="valid", activation='linear', name='rpn_bbox_pred')(shared)
    # Reshape to [batch, anchors, 4] 每个anchor的4个坐标偏移
    rpn_bbox = KL.Lambda(lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 4]))(x)
    
    return [rpn_class_logits, rpn_probs, rpn_bbox]

1.4 ProposalLayer

ProposalLayer层,主要根据输入的anchor得分选择它的子集进入下一阶段。筛选工作主要基于anchor score,NMS,同时根据预测的偏移值对anchor进行了微调。
主要任务:

  1. 根据rpn_probs,只保留score靠前的前6000个anchor
  2. 根据rpn_bbox,对anchors进行修正
  3. 舍弃掉修正后边框超过图片大小的anchor,由于我们的anchor的坐标的大小是归一化的,只要坐标不超过0 1区间即可
  4. 利用非极大抑制的方法(阈值等于0.7)获得最后的2000个anchor
# Generate proposals
# Proposals are [batch, N, (y1, x1, y2, x2)] in normalized coordinates and zero padded.
proposal_count = config.POST_NMS_ROIS_TRAINING if mode == "training" else config.POST_NMS_ROIS_INFERENCE
rpn_rois = ProposalLayer(
		    proposal_count=proposal_count,
		    nms_threshold=config.RPN_NMS_THRESHOLD,
		    name="ROI",
		    config=config)([rpn_class, rpn_bbox, anchors])
class ProposalLayer(KE.Layer):
"""
Inputs:
    rpn_probs: [batch, num_anchors, (bg prob, fg prob)]
    rpn_bbox: [batch, num_anchors, (dy, dx, log(dh), log(dw))]
    anchors: [batch, num_anchors, (y1, x1, y2, x2)] anchors in normalized coordinates
Returns:
    Proposals in normalized coordinates [batch, rois, (y1, x1, y2, x2)]
"""

def __init__(self, proposal_count, nms_threshold, config=None, **kwargs):
    super(ProposalLayer, self).__init__(**kwargs)
    self.config = config
    self.proposal_count = proposal_count
    self.nms_threshold = nms_threshold

def call(self, inputs):
    # Box Scores. Use the foreground class confidence. [Batch, num_rois, 1]
    scores = inputs[0][:, :, 1]
    # Box deltas [batch, num_rois, 4]
    deltas = inputs[1]
    deltas = deltas * np.reshape(self.config.RPN_BBOX_STD_DEV, [1, 1, 4])
    # Anchors
    anchors = inputs[2]

    # Improve performance by trimming to top anchors by score
    # and doing the rest on the smaller subset.
    pre_nms_limit = tf.minimum(self.config.PRE_NMS_LIMIT, tf.shape(anchors)[1])
    ix = tf.nn.top_k(scores, pre_nms_limit, sorted=True,
                     name="top_anchors").indices
    scores = utils.batch_slice([scores, ix], lambda x, y: tf.gather(x, y),
                               self.config.IMAGES_PER_GPU)
    deltas = utils.batch_slice([deltas, ix], lambda x, y: tf.gather(x, y),
                               self.config.IMAGES_PER_GPU)
    pre_nms_anchors = utils.batch_slice([anchors, ix], lambda a, x: tf.gather(a, x),
                                self.config.IMAGES_PER_GPU,
                                names=["pre_nms_anchors"])

    # Apply deltas to anchors to get refined anchors.
    # [batch, N, (y1, x1, y2, x2)]
    boxes = utils.batch_slice([pre_nms_anchors, deltas],
                              lambda x, y: apply_box_deltas_graph(x, y),
                              self.config.IMAGES_PER_GPU,
                              names=["refined_anchors"])

    # Clip to image boundaries. Since we're in normalized coordinates,
    # clip to 0..1 range. [batch, N, (y1, x1, y2, x2)]
    window = np.array([0, 0, 1, 1], dtype=np.float32)
    boxes = utils.batch_slice(boxes,
                              lambda x: clip_boxes_graph(x, window),
                              self.config.IMAGES_PER_GPU,
                              names=["refined_anchors_clipped"])

    # Filter out small boxes
    # According to Xinlei Chen's paper, this reduces detection accuracy
    # for small objects, so we're skipping it.

    # Non-max suppression
    def nms(boxes, scores):
        indices = tf.image.non_max_suppression(
            boxes, scores, self.proposal_count,
            self.nms_threshold, name="rpn_non_max_suppression")
        proposals = tf.gather(boxes, indices)
        # Pad if needed
        padding = tf.maximum(self.proposal_count - tf.shape(proposals)[0], 0)
        proposals = tf.pad(proposals, [(0, padding), (0, 0)])
        return proposals
    proposals = utils.batch_slice([boxes, scores], nms,
                                  self.config.IMAGES_PER_GPU)
    return proposals

def compute_output_shape(self, input_shape):
    return (None, self.proposal_count, 4)

1.5 DetectionTargetLayer

DetectionTargetLayer的输入:

Inputs:
proposals:      [batch, N, (y1, x1, y2, x2)] in normalized coordinates. Might be zero padded if there are not enough proposals.
gt_class_ids:   [batch, MAX_GT_INSTANCES] Integer class IDs.
gt_boxes:       [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in normalized coordinates.
gt_masks:       [batch, height, width, MAX_GT_INSTANCES] of boolean type

输出:

rois:               [batch, TRAIN_ROIS_PER_IMAGE, (y1, x1, y2, x2)] in normalized coordinates
target_class_ids:   [batch, TRAIN_ROIS_PER_IMAGE]. Integer class IDs.
target_deltas:      [batch, TRAIN_ROIS_PER_IMAGE, (dy, dx, log(dh), log(dw)]
target_mask:        [batch, TRAIN_ROIS_PER_IMAGE, height, width]
                    Masks cropped to bbox boundaries and resized to neural network output size.

首先,计算proposals中的每一个rois和哪一个真实的框gt_boxes iou值,如果最大的iou大于0.5,则被认为是正样本,负样本是是iou小于0.5并且和crowd box相交不大的anchor,选择出了正负样本,还要保证样本的均衡性(这里要求正负样本比例达到为1:3)。最后计算了正样本中的anchor和哪一个真实的框最接近,用真实的框和anchor计算出偏移值,并且将mask的大小resize成28*28的,这些都是后面的分类和mask网络要用到的真实的值。

# Generate detection targets
# Subsamples proposals and generates target outputs for training
# Note that proposal class IDs, gt_boxes, and gt_masks are zero padded. Equally, returned rois and targets are zero padded.
rois, target_class_ids, target_bbox, target_mask = DetectionTargetLayer(config, name="proposal_targets")([
target_rois, input_gt_class_ids, gt_boxes, input_gt_masks])

通过rpn网络得到的anchor,选择出来正负样本,并计算出正样本和真实框的差距,以及要预测的mask的值,这些都是在后面的网络中计算损失函数需要的真实值。

1.6 Feature Pyramid Network Heads(fpn_classifier_graph)

该网络是maskrcnn的最后一层,与之并行的还有一个mask分支,在这里先介绍一下这个分类网络。
由1.5得到的roi的大小并不一样,但是后续的全连接层的要求输入大小相同。因此,需要一个网络类似于fasterRcnn中的RoiPooling,将rois转换成大小一样的特征图。maskrcnn中使用的是PyramidROIAlign。PyramidROIAlign首先根据下面的公司计算每一个roi来自于金字塔特征的P2到P5的哪一层的特征:
在这里插入图片描述
然后从对应的特征图中取出坐标对应的区域,利用双线性插值的方式进行pooling操作。PyramidROIAlign会返回resize成相同大小的rois(一般是7*7)。
可以参考:
将得到的特征块输入到fpn_classifier_graph网络中,得到分类和回归值。

def fpn_classifier_graph(rois, feature_maps, image_meta,
                         pool_size, num_classes, train_bn=True,
                         fc_layers_size=1024):
    """Builds the computation graph of the feature pyramid network classifier
    and regressor heads.
    rois: [batch, num_rois, (y1, x1, y2, x2)] Proposal boxes in normalized
          coordinates.
    feature_maps: List of feature maps from different layers of the pyramid,
                  [P2, P3, P4, P5]. Each has a different resolution.
    - image_meta: [batch, (meta data)] Image details. See compose_image_meta()
    pool_size: The width of the square feature map generated from ROI Pooling.
    num_classes: number of classes, which determines the depth of the results
    train_bn: Boolean. Train or freeze Batch Norm layers
    fc_layers_size: Size of the 2 FC layers
    Returns:
        logits: [N, NUM_CLASSES] classifier logits (before softmax)
        probs: [N, NUM_CLASSES] classifier probabilities
        bbox_deltas: [N, (dy, dx, log(dh), log(dw))] Deltas to apply to
                     proposal boxes
    """
    # ROI Pooling
    # Shape: [batch, num_boxes, pool_height, pool_width, channels]
    x = PyramidROIAlign([pool_size, pool_size],
                        name="roi_align_classifier")([rois, image_meta] + feature_maps)
    # Two 1024 FC layers (implemented with Conv2D for consistency)
    x = KL.TimeDistributed(KL.Conv2D(fc_layers_size, (pool_size, pool_size), padding="valid"),
                           name="mrcnn_class_conv1")(x)
    x = KL.TimeDistributed(BatchNorm(), name='mrcnn_class_bn1')(x, training=train_bn)
    x = KL.Activation('relu')(x)
    x = KL.TimeDistributed(KL.Conv2D(fc_layers_size, (1, 1)),
                           name="mrcnn_class_conv2")(x)
    x = KL.TimeDistributed(BatchNorm(), name='mrcnn_class_bn2')(x, training=train_bn)
    x = KL.Activation('relu')(x)
 
    shared = KL.Lambda(lambda x: K.squeeze(K.squeeze(x, 3), 2),
                       name="pool_squeeze")(x)
 
    # Classifier head
    mrcnn_class_logits = KL.TimeDistributed(KL.Dense(num_classes),
                                            name='mrcnn_class_logits')(shared)
    mrcnn_probs = KL.TimeDistributed(KL.Activation("softmax"),
                                     name="mrcnn_class")(mrcnn_class_logits)
 
    # BBox head
    # [batch, boxes, num_classes * (dy, dx, log(dh), log(dw))]
    x = KL.TimeDistributed(KL.Dense(num_classes * 4, activation='linear'),
                           name='mrcnn_bbox_fc')(shared)
    # Reshape to [batch, boxes, num_classes, (dy, dx, log(dh), log(dw))]
    s = K.int_shape(x)
    mrcnn_bbox = KL.Reshape((s[1], num_classes, 4), name="mrcnn_bbox")(x)
 
    return mrcnn_class_logits, mrcnn_probs, mrcnn_bbox

返回值:

Returns:
    logits:		 [N, NUM_CLASSES] classifier logits (before softmax)
    probs: 		 [N, NUM_CLASSES] classifier probabilities
    bbox_deltas: [N, (dy, dx, log(dh), log(dw))] Deltas to apply to proposal boxes

1.7 build_fpn_mask_graph

mask网络的输入和1.6网络的输入值是一样的,也会经过PyramidROIAlign,但是这里最后得到的特征图大小是14*14。
返回值:

Returns: Masks [batch, roi_count, height, width, num_classes]

1.8损失函数

参考:Mask RCNN 简单使用
maskrcnn中总共有五个损失函数,分别是RPN网络的两个损失,mrcnn的两个损失,以及mask损失。
两个Lcls、Lbox:预测出每个RoI的所属类别及其矩形框坐标值。
Lmask:
① mask分支采用FCN对每个RoI的分割输出维数为K x m x m(其中:m表示RoI Align特征图的大小),即K个类别的m x m的二值mask;保持m*m的空间布局,pixel-to-pixel操作需要保证RoI特征 映射到原图的对齐性,这也是使用RoIAlign解决对齐问题原因,减少像素级别对齐的误差。

Kmm二值mask结构解释:最终的FCN输出一个K层的mask,每一层为一类,Log输出,用0.5作为阈值进行二值化,产生背景和前景的分割Mask

这样,Lmask 使得网络能够输出每一类的 mask,且不会有不同类别 mask 间的竞争. 分类网络分支预测 object 类别标签,以选择输出 mask,对每一个ROI,如果检测得到ROI属于哪一个分 类,就只使用哪一个分支的相对熵误差作为误差值进行计算。(举例说明:分类有3类(猫,狗,人),检测得到当前ROI属于“人”这一类,那么所使用的Lmask为“人”这一分支的mask,即,每个class类别对应一个mask可以有效避免类间竞争(其他class不贡献Loss)
② 对每一个像素应用sigmoid,然后取RoI上所有像素的交叉熵的平均值作为Lmask。

# Losses
rpn_class_loss = KL.Lambda(lambda x: rpn_class_loss_graph(*x), name="rpn_class_loss")([input_rpn_match, rpn_class_logits])
rpn_bbox_loss = KL.Lambda(lambda x: rpn_bbox_loss_graph(config, *x), name="rpn_bbox_loss")([input_rpn_bbox, input_rpn_match, rpn_bbox])
class_loss = KL.Lambda(lambda x: mrcnn_class_loss_graph(*x), name="mrcnn_class_loss")([target_class_ids, mrcnn_class_logits, active_class_ids])
bbox_loss = KL.Lambda(lambda x: mrcnn_bbox_loss_graph(*x), name="mrcnn_bbox_loss")(
   			 [target_bbox, target_class_ids, mrcnn_bbox])
mask_loss = KL.Lambda(lambda x: mrcnn_mask_loss_graph(*x), name="mrcnn_mask_loss")(
    		[target_mask, target_class_ids, mrcnn_mask])

1.9 总的模型

 # Model
inputs = [input_image, input_image_meta,
          input_rpn_match, input_rpn_bbox, input_gt_class_ids, input_gt_boxes, input_gt_masks]
if not config.USE_RPN_ROIS:
    inputs.append(input_rois)
outputs = [rpn_class_logits,
           rpn_class,  # [batch, num_anchors, 2]
           rpn_bbox,   # [batch, num_anchors, 4]
           mrcnn_class_logits,
           mrcnn_class, # [batch, num_rois, NUM_CLASSES]
           mrcnn_bbox,  # [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))]
           mrcnn_mask,  # [batch, num_detections, MASK_POOL_SIZE, MASK_POOL_SIZE, NUM_CLASSES]
           rpn_rois,    #  [batch, num_rois, (y1, x1, y2, x2, class_id, score)]
           output_rois,
           rpn_class_loss, rpn_bbox_loss, class_loss, bbox_loss, mask_loss]
model = KM.Model(inputs, outputs, name='mask_rcnn')

2 模型的训练

在模型训练的时候,会根据配置文件读取信息,将数据读取到一个dataset的对象中,会计算出图片真实的anchor和 mask在模型的RPN阶段和最后预测阶段计算loss值使用。这个代码可以设置每次训练的层数,甚至可以只训练某基层。训练的时候最重要的是data_generator的生成。

3 模型的inference

在模型的预测阶段,我们调用的是模型的detect方法,首先读取要预测的图片。然后调用predict方法。

def detect(self, images, verbose=0):
    """Runs the detection pipeline.

    images: List of images, potentially of different sizes.

    Returns a list of dicts, one dict per image. The dict contains:
    rois: [N, (y1, x1, y2, x2)] detection bounding boxes
    class_ids: [N] int class IDs
    scores: [N] float probability scores for the class IDs
    masks: [H, W, N] instance binary masks
    """
    assert self.mode == "inference", "Create model in inference mode."
    assert len(
        images) == self.config.BATCH_SIZE, "len(images) must be equal to BATCH_SIZE"

    if verbose:
        log("Processing {} images".format(len(images)))
        for image in images:
            log("image", image)

    # Mold inputs to format expected by the neural network
    molded_images, image_metas, windows = self.mold_inputs(images)

    # Validate image sizes
    # All images in a batch MUST be of the same size
    image_shape = molded_images[0].shape
    for g in molded_images[1:]:
        assert g.shape == image_shape,\
            "After resizing, all images must have the same size. Check IMAGE_RESIZE_MODE and image sizes."

    # Anchors
    anchors = self.get_anchors(image_shape)
    # Duplicate across the batch dimension because Keras requires it
    # TODO: can this be optimized to avoid duplicating the anchors?
    anchors = np.broadcast_to(anchors, (self.config.BATCH_SIZE,) + anchors.shape)

    if verbose:
        log("molded_images", molded_images)
        log("image_metas", image_metas)
        log("anchors", anchors)
    # Run object detection
    detections, _, _, mrcnn_mask, _, _, _ =\
        self.keras_model.predict([molded_images, image_metas, anchors], verbose=0)
    # Process detections
    results = []
    for i, image in enumerate(images):
        final_rois, final_class_ids, final_scores, final_masks =\
            self.unmold_detections(detections[i], mrcnn_mask[i],
                                   image.shape, molded_images[i].shape,
                                   windows[i])
        results.append({
            "rois": final_rois,
            "class_ids": final_class_ids,
            "scores": final_scores,
            "masks": final_masks,
        })
    return results

你可能感兴趣的:(cv,检测)