https://github.com/abbyQu/Mask_RCNN
RPN的输入是卷积层最后的feature map,所谓的sliding window其实还是做卷积,用n*n论文里说的是窗口大小,其实就是卷积核大小。这里取n=3。
上述卷积结果,得到了一个共享的层,代码里叫shared、这个共享层,指的是class(是否为roi)和regression(4个bbox数据的回归)共享。
论文里提到说shared到cls和reg是个fcn,代码里也是用一个con2d实现的。卷积后,reshape成对应的维度:reg头是[batchsize,anchorsize,4],cls头是[batchsize,anchorsize,2]。
def rpn_graph(feature_map, anchors_per_location, anchor_stride):
shared = KL.Conv2D(512, (3, 3), padding='same', activation='relu',
strides=anchor_stride,
name='rpn_conv_shared')(feature_map)
# Anchor Score. [batch, height, width, anchors per location * 2].
x = KL.Conv2D(2 * anchors_per_location, (1, 1), padding='valid',
activation='linear', name='rpn_class_raw')(shared)
# Reshape to [batch, anchors, 2]
rpn_class_logits = KL.Lambda(
lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 2]))(x)
# Softmax on last dimension of BG/FG.
rpn_probs = KL.Activation(
"softmax", name="rpn_class_xxx")(rpn_class_logits)
# Bounding box refinement. [batch, H, W, anchors per location, depth]
# where depth is [x, y, log(w), log(h)]
x = KL.Conv2D(anchors_per_location * 4, (1, 1), padding="valid",
activation='linear', name='rpn_bbox_pred')(shared)
# Reshape to [batch, anchors, 4]
rpn_bbox = KL.Lambda(lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 4]))(x)
return [rpn_class_logits, rpn_probs, rpn_bbox]