Mask RCNN是在Faster_RCNN基础上提出网络结构,主要用于目标检测和实例分割。主要思想是在Faster RCNN框架上扩展Mask分支进行像素分割。
阅读的源码是matterport/Mask_RCNN,由python3、keras和tensorflow构建完整套代码。
整个代码详解分为4部分,依次为:Basebone Network代码
Region Propasal Network(RPN)代码
Network Heads代码
Losses代码
整个MaskRCNN模型的构建代码在mrcnn/model.py文件中,可以详细浏览浏览。
此处介绍第二部分Region Propasal Network(RPN)代码。
1.Region Propasal Network(RPN)代码
# Anchors
if mode == "training":
anchors = self.get_anchors(config.IMAGE_SHAPE)
anchors = np.broadcast_to(anchors, (config.BATCH_SIZE,) + anchors.shape)
anchors = KL.Lambda(lambda x: tf.Variable(anchors), name="anchors")(input_image)
else:
anchors = input_anchors
#---------------------------------分割线-----------------------------
# RPN Model
rpn = build_rpn_model(config.RPN_ANCHOR_STRIDE,
len(config.RPN_ANCHOR_RATIOS), config.TOP_DOWN_PYRAMID_SIZE)
# Loop through pyramid layers
layer_outputs = [] # list of lists
for p in rpn_feature_maps:
layer_outputs.append(rpn([p]))
# Concatenate layer outputs
# Convert from list of lists of level outputs to list of lists
# of outputs across levels.
# e.g. [[a1, b1, c1], [a2, b2, c2]] => [[a1, a2], [b1, b2], [c1, c2]]
output_names = ["rpn_class_logits", "rpn_class", "rpn_bbox"]
outputs = list(zip(*layer_outputs))
outputs = [KL.Concatenate(axis=1, name=n)(list(o))
for o, n in zip(outputs, output_names)]
rpn_class_logits, rpn_class, rpn_bbox = outputs
# Generate proposals
# Proposals are [batch, N, (y1, x1, y2, x2)] in normalized coordinates
# and zero padded.
proposal_count = config.POST_NMS_ROIS_TRAINING if mode == "training"\
else config.POST_NMS_ROIS_INFERENCE
rpn_rois = ProposalLayer(
proposal_count=proposal_count,
nms_threshold=config.RPN_NMS_THRESHOLD,
name="ROI",
config=config)([rpn_class, rpn_bbox, anchors])config类配置参数
函数类
2.Region Propasal Network(RPN)结构示意图RPN结构示意图
最左侧的anchors是生成锚框流程,中间的build_rpn_model->rpn_graph是构建RPN网络流程,最右侧的ProposalLayer是筛选ROIs的生成建议框流程。
因为RPN网络的输出结果中rpn_bbox预测的是锚框与真实box的偏移值,所以需要使用ProposalLayer对anchors的数量和位置进行筛选和精修。
2.1 anchors代码
# mrcnn/model.py
def get_anchors(self, image_shape):
...
a = utils.generate_pyramid_anchors(
self.config.RPN_ANCHOR_SCALES,
self.config.RPN_ANCHOR_RATIOS,
backbone_shapes,
self.config.BACKBONE_STRIDES,
self.config.RPN_ANCHOR_STRIDE)
...
#---------------------------------分割线-----------------------------
# mrcnn/utils.py
def generate_pyramid_anchors(scales, ratios, feature_shapes, feature_strides,
anchor_stride):
anchors = []
for i in range(len(scales)):
anchors.append(generate_anchors(scales[i], ratios, feature_shapes[i],
feature_strides[i], anchor_stride))
return np.concatenate(anchors, axis=0)
def generate_anchors(scales, ratios, shape, feature_stride, anchor_stride):
...
...
return boxes
函数内部调用了utils.py内的anchors流程图配置参数RPN_ANCHOR_SCALES是anchor尺寸,分别为 (32, 64, 128, 256, 512),对应rpn_feature_maps的[P2, P3, P4, P5, P6],分辨率依次为[256,128,64,32,16],也就是说底层高分辨率特征去检测较小的目标,顶层低分辨率特征图用于去检测较大的目标。
RPN_ANCHOR_RATIOS是锚框的长宽比,对应每一种尺寸的锚框取[0.5, 1, 2],3种长宽比
BACKBONE_STRIDES 是特征图的降采样倍数,取[4, 8, 16, 32, 64]
BACKBONE_SHAPE是特征图分辨率,为[16,32,64,128,256]
generate_anchors是具体的为每一特征层生成anchor的函数,generate_pyramid_anchors用于拼接不同scale的anchor,最终得到anchors的shape为[anchor_count, (y1, x1, y2, x2)],此时计算的anchor_count = (256*256 + 128*128 + 64*64 + 32*32 + 16*16)*3 = 261888。数量如此多的锚框不可能全部用于预测,所以有了后续的proposallayer进行筛选。
2.2 build_rpn_model->rpn_graph代码
def build_rpn_model(anchor_stride, anchors_per_location, depth):
input_feature_map = KL.Input(shape=[None, None, depth],
name="input_rpn_feature_map")
outputs = rpn_graph(input_feature_map, anchors_per_location, anchor_stride)
return KM.Model([input_feature_map], outputs, name="rpn_model")
def rpn_graph(feature_map, anchors_per_location, anchor_stride):
shared = KL.Conv2D(512, (3, 3), padding='same', activation='relu',
strides=anchor_stride,
name='rpn_conv_shared')(feature_map)
# Anchor Score. [batch, height, width, anchors per location * 2].
x = KL.Conv2D(2 * anchors_per_location, (1, 1), padding='valid',
activation='linear', name='rpn_class_raw')(shared)
# Reshape to [batch, anchors, 2]
rpn_class_logits = KL.Lambda(
lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 2]))(x)
# Softmax on last dimension of BG/FG.
rpn_probs = KL.Activation(
"softmax", name="rpn_class_xxx")(rpn_class_logits)
x = KL.Conv2D(anchors_per_location * 4, (1, 1), padding="valid",
activation='linear', name='rpn_bbox_pred')(shared)
# Reshape to [batch, anchors, 4]
rpn_bbox = KL.Lambda(lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 4]))(x)
return [rpn_class_logits, rpn_probs, rpn_bbox]
build_rpn_graph调用rpn_graph构建rpn网络,首先会对特征层进行3*3的卷积操作,再分出两个分支,一个用于分类,一个用于回归目标框。代码内的anchors_per_location=3,是因为会对该特征图的锚框尺寸取3种ratio。流程图及解释如下:rpn网络构建示意图rpn_feature_maps是用于rpn推导的特征图列表,取[P2, P3, P4, P5, P6]。这里会对列表中的每个特征层做3*3卷积得到共享特征层shared。
shared = KL.Conv2D(512, (3, 3), padding='same', activation='relu',strides=anchor_stride,
name='rpn_conv_shared')(feature_map)shared层一个分支用于分类, 2 * anchors_per_location意味着每个锚框都要预测其前景/背景概率。
x = KL.Conv2D(2 * anchors_per_location, (1, 1), padding='valid',
activation='linear', name='rpn_class_raw')(shared)shared层一个分支用于回归目标框, 4 * anchors_per_location意味着每个锚框都要预测其4个位置相关值概率。这里的回归框使用中点坐标表达,即[x, y, log(w), log(h)]。
x = KL.Conv2D(anchors_per_location * 4, (1, 1), padding="valid",
activation='linear', name='rpn_bbox_pred')(shared)最后输出内容为[rpn_class_logits, rpn_probs, rpn_bbox],其中rpn_probs是预测概率,rpn_bbox是预测目标框偏移量
2.3 ProposalLayer代码
class ProposalLayer(KE.Layer):
def __init__(self, proposal_count, nms_threshold, config=None, **kwargs):
super(ProposalLayer, self).__init__(**kwargs)
self.config = config
self.proposal_count = proposal_count
self.nms_threshold = nms_threshold
def call(self, inputs):
# Box Scores. Use the foreground class confidence. [Batch, num_rois, 1]
scores = inputs[0][:, :, 1]
# Box deltas [batch, num_rois, 4]
deltas = inputs[1]
deltas = deltas * np.reshape(self.config.RPN_BBOX_STD_DEV, [1, 1, 4])
# Anchors
anchors = inputs[2]
# Improve performance by trimming to top anchors by score
# and doing the rest on the smaller subset.
pre_nms_limit = tf.minimum(6000, tf.shape(anchors)[1])
ix = tf.nn.top_k(scores, pre_nms_limit, sorted=True,
name="top_anchors").indices
scores = utils.batch_slice([scores, ix], lambda x, y: tf.gather(x, y),
self.config.IMAGES_PER_GPU)
deltas = utils.batch_slice([deltas, ix], lambda x, y: tf.gather(x, y),
self.config.IMAGES_PER_GPU)
pre_nms_anchors = utils.batch_slice([anchors, ix], lambda a, x: tf.gather(a, x),
self.config.IMAGES_PER_GPU,
names=["pre_nms_anchors"])
# Apply deltas to anchors to get refined anchors.
# [batch, N, (y1, x1, y2, x2)]
boxes = utils.batch_slice([pre_nms_anchors, deltas],
lambda x, y: apply_box_deltas_graph(x, y),
self.config.IMAGES_PER_GPU,
names=["refined_anchors"])
# Clip to image boundaries. Since we're in normalized coordinates,
# clip to 0..1 range. [batch, N, (y1, x1, y2, x2)]
window = np.array([0, 0, 1, 1], dtype=np.float32)
boxes = utils.batch_slice(boxes,
lambda x: clip_boxes_graph(x, window),
self.config.IMAGES_PER_GPU,
names=["refined_anchors_clipped"])
def nms(boxes, scores):
indices = tf.image.non_max_suppression(
boxes, scores, self.proposal_count,
self.nms_threshold, name="rpn_non_max_suppression")
proposals = tf.gather(boxes, indices)
# Pad if needed
padding = tf.maximum(self.proposal_count - tf.shape(proposals)[0], 0)
proposals = tf.pad(proposals, [(0, padding), (0, 0)])
return proposals
proposals = utils.batch_slice([boxes, scores], nms,
self.config.IMAGES_PER_GPU)
return proposals
def compute_output_shape(self, input_shape):
return (None, self.proposal_count, 4)
proposallayer先根据score选择一部分,再对anchor执行边框精修并剔除超出边界的锚框,最后对筛选的锚框进行非极大值抑制,选出最终的锚框rpn_rois。流程图及解释如下:proposal流程图proposallayer需要3个输入参数依次为[rpn_class, rpn_bbox, anchors],自定义层内部表示为[scores, deltas, anchors],其中rpn_class和rpn_bbox为2.2预测分类和回归框偏移,2.1生成的anchors,各有261888个。在训练mrcnn时,一张图片不可能用这么多rois,因此要进一步筛选。
tf.nn.top_k用于根据rpn_class概率(也可以理解为rois得分)选取至多前6000个rois的索引,然后再根据索引选择出相应的属于top-6000的[scores, deltas, anchors]。tf.gather用于根据y索引选择x值。
pre_nms_limit = tf.minimum(6000, tf.shape(anchors)[1])
ix = tf.nn.top_k(scores, pre_nms_limit, sorted=True,name="top_anchors").indices
scores = utils.batch_slice([scores, ix], lambda x, y: tf.gather(x, y),
self.config.IMAGES_PER_GPU)
deltas = utils.batch_slice([deltas, ix], lambda x, y: tf.gather(x, y),
self.config.IMAGES_PER_GPU)
pre_nms_anchors = utils.batch_slice([anchors, ix], lambda a, x: tf.gather(a, x),
self.config.IMAGES_PER_GPU, names=["pre_nms_anchors"])apply_box_deltas_graph用于根据deltas对anchors进行精修
boxes = utils.batch_slice([pre_nms_anchors, deltas], lambda x, y: apply_box_deltas_graph(x, y),
self.config.IMAGES_PER_GPU, names=["refined_anchors"])clip_boxes_graph用于将超出图片范围的anchors进行剔除,这里由于回归框是归一化在[0,1]区间内,所以通过clip进行限定。
window = np.array([0, 0, 1, 1], dtype=np.float32)
boxes = utils.batch_slice(boxes, lambda x: clip_boxes_graph(x, window),
self.config.IMAGES_PER_GPU, names=["refined_anchors_clipped"])nms执行非极大值抑制,根据IoU阈值选择出2000个rois,如果选择的rois不足2000,则用0进行pad填充。
def nms(boxes, scores):
indices = tf.image.non_max_suppression(boxes, scores, self.proposal_count,
self.nms_threshold, name="rpn_non_max_suppression")
proposals = tf.gather(boxes, indices)
# Pad if needed
padding = tf.maximum(self.proposal_count - tf.shape(proposals)[0], 0)
proposals = tf.pad(proposals, [(0, padding), (0, 0)])
return proposals
proposals = utils.batch_slice([boxes, scores], nms,self.config.IMAGES_PER_GPU)最终返回的proposals赋值给rpn_rois,作为rpn网络提供的建议区,注入后续FPN heads进行分类、目标框和像素分割的检测。
第二部分结束,欢迎批评指正~