功能:
proposal layer 负责将 回归得到的([dx(A),dy(A),dw(A),dh(A)]与 foreground anchors 结合,计算出精准的proposal ,送入后续的 Roi Pooling layer
层的定义:
layer {
name: 'proposal'
type: 'Python'
bottom: 'rpn_cls_prob_reshape'
bottom: 'rpn_bbox_pred'
bottom: 'im_info'
top: 'rpn_rois'
# top: 'rpn_scores'
python_param {
module: 'rpn.proposal_layer'
layer: 'ProposalLayer'
param_str: "'feat_stride': 16"
}
}
输入:
- fg/bg anchors 分类器的结果:rpn_cls_prob_reshape;
- 对应的bbox reg的[dx(A),dy(A),dw(A),dh(A)]变换量rpn_bbox_pred
- im_info 图片信息
- feat_stride=16 步长为16
(一)bounding box regression原理
如图所示:
绿色框为 Ground Truth(GT)
红色框为 提取的foreground anchors;
我们需要对红色框进行修正;
对于每个框,我们用(x,y,w,h)中心点(x,y),以及宽高(w,h)表示;
则红色框A 代表原始的foreground anchors
绿色框G代表目标的GT;
给定一种映射f ,使得,其中,;
(1) 先做平移:
(2)再做缩放:
所以,需要学习的是 这四个变化;
在Faster R-CNN 原文中,平移量与尺度因子如下:
接下来的问题就是如何通过线性回归获得dx(A),dy(A),dw(A),dh(A)了。线性回归就是给定输入的特征向量X, 学习一组参数W, 使得经过线性回归后的值跟真实值Y(即GT)非常接近,即Y=WX。对于该问题,输入X是一张经过num_output=1的1x1卷积获得的feature map,定义为Φ;同时还有训练传入的GT,即(tx, ty, tw, th)。输出是dx(A),dy(A),dw(A),dh(A)四个变换。
那么目标函数可以表示为:
其中为对应anchor的feature map 组成的特征向量,w 是需要学习的参数,d(A)是预测值(x,y,w,h);
所用我们采用mse loss 的话:
所以正则化后.函数优化目标为:
在caffe 中,利用一个conv 层进行 (dx,dy,dw,dh)的学习:
layer {
name: "rpn_bbox_pred"
type: "Convolution"
bottom: "rpn/output"
top: "rpn_bbox_pred"
convolution_param {
num_output: 36 # 4 * 9(anchors)
kernel_size: 1 pad: 0 stride: 1
}
}
caffe blob存储为[1, 4x9, Q, P]。与上文中fg/bg anchors存储为[1, 18, Q, P]类似;
在py-faster-rcnn 中,回归框的代码如 下:
##bbox_transform.py
def bbox_transform_inv(boxes, deltas):
if boxes.shape[0] == 0:
return np.zeros((0, deltas.shape[1]), dtype=deltas.dtype)
boxes = boxes.astype(deltas.dtype, copy=False)
widths = boxes[:, 2] - boxes[:, 0] + 1.0
heights = boxes[:, 3] - boxes[:, 1] + 1.0
ctr_x = boxes[:, 0] + 0.5 * widths
ctr_y = boxes[:, 1] + 0.5 * heights
dx = deltas[:, 0::4]
dy = deltas[:, 1::4]
dw = deltas[:, 2::4]
dh = deltas[:, 3::4]
#利用公式得到回归框的位置
#(1)平移框的位置
pred_ctr_x = dx * widths[:, np.newaxis] + ctr_x[:, np.newaxis]
pred_ctr_y = dy * heights[:, np.newaxis] + ctr_y[:, np.newaxis]
#(2)缩放框的位置
pred_w = np.exp(dw) * widths[:, np.newaxis]
pred_h = np.exp(dh) * heights[:, np.newaxis]
pred_boxes = np.zeros(deltas.shape, dtype=deltas.dtype)
#重新恢复[x1,y1,x2,y2]的形式
# x1
pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w
# y1
pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
# x2
pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w
# y2
pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h
return pred_boxes
(二)proposal forward函数解析
- 首先,获取参数,方便进行nms
def forward(self, bottom, top):
# Algorithm:
#
# for each (H, W) location i
# generate A anchor boxes centered on cell i
# apply predicted bbox deltas at cell i to each of the A anchors
# clip predicted boxes to image
# remove predicted boxes with either height or width < threshold
# sort all (proposal, score) pairs by score from highest to lowest
# take top pre_nms_topN proposals before NMS
# apply NMS with threshold 0.7 to remaining proposals
# take after_nms_topN proposals after NMS
# return the top proposals (-> RoIs top, scores top)
assert bottom[0].data.shape[0] == 1, \
'Only single item batches are supported'
cfg_key = str(self.phase) # either 'TRAIN' or 'TEST'
pre_nms_topN = cfg[cfg_key].RPN_PRE_NMS_TOP_N
post_nms_topN = cfg[cfg_key].RPN_POST_NMS_TOP_N
nms_thresh = cfg[cfg_key].RPN_NMS_THRESH
min_size = cfg[cfg_key].RPN_MIN_SIZE
- 再次产生anchor,方便对框进行精准回归(之前anchorTarget 产生过一次anchor, 那是为了与target计算IOU,方便计算哪个anchor 属于fg/bg)
# Enumerate all shifts
#再次生产anchors
shift_x = np.arange(0, width) * self._feat_stride
shift_y = np.arange(0, height) * self._feat_stride
shift_x, shift_y = np.meshgrid(shift_x, shift_y)
shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
shift_x.ravel(), shift_y.ravel())).transpose()
# Enumerate all shifted anchors:
#
# add A anchors (1, A, 4) to
# cell K shifts (K, 1, 4) to get
# shift anchors (K, A, 4)
# reshape to (K*A, 4) shifted anchors
A = self._num_anchors
K = shifts.shape[0]
anchors = self._anchors.reshape((1, A, 4)) + \
shifts.reshape((1, K, 4)).transpose((1, 0, 2))
anchors = anchors.reshape((K * A, 4))
# Transpose and reshape predicted bbox transformations to get them
# into the same order as the anchors:
#
# bbox deltas will be (1, 4 * A, H, W) format
# transpose to (1, H, W, 4 * A)
# reshape to (1 * H * W * A, 4) where rows are ordered by (h, w, a)
# in slowest to fastest order
bbox_deltas = bbox_deltas.transpose((0, 2, 3, 1)).reshape((-1, 4))
# Same story for the scores:
#
# scores are (1, A, H, W) format
# transpose to (1, H, W, A)
# reshape to (1 * H * W * A, 1) where rows are ordered by (h, w, a)
#
scores = scores.transpose((0, 2, 3, 1)).reshape((-1, 1))
- 对anchor 框信息做回归拟合,为了得到更准确的框,同时,将回归后的框 去除 超出图像边缘部分,
# Convert anchors into proposals via bbox transformations
#利用 anchor 做 bounding box regression位置回归
proposals = bbox_transform_inv(anchors, bbox_deltas)
# 2. clip predicted boxes to image
#取
proposals = clip_boxes(proposals, im_info[:2])
# 3. remove predicted boxes with either height or width < threshold
# (NOTE: convert min_size to input image scale stored in im_info[2])
keep = _filter_boxes(proposals, min_size * im_info[2])
proposals = proposals[keep, :]
scores = scores[keep]
clip_boxes 函数很简单, 将(x1,y1,x2,y2)分别与图像边缘做比较,保留在边缘内的框
def clip_boxes(boxes, im_shape):
"""
Clip boxes to image boundaries.
"""
# x1 >= 0
boxes[:, 0::4] = np.maximum(np.minimum(boxes[:, 0::4], im_shape[1] - 1), 0)
# y1 >= 0
boxes[:, 1::4] = np.maximum(np.minimum(boxes[:, 1::4], im_shape[0] - 1), 0)
# x2 < im_shape[1]
boxes[:, 2::4] = np.maximum(np.minimum(boxes[:, 2::4], im_shape[1] - 1), 0)
# y2 < im_shape[0]
boxes[:, 3::4] = np.maximum(np.minimum(boxes[:, 3::4], im_shape[0] - 1), 0)
return boxes
_filter_boxes 函数要做的事情是保留一定宽高的框;
def _filter_boxes(boxes, min_size):
"""Remove all boxes with any side smaller than min_size."""
ws = boxes[:, 2] - boxes[:, 0] + 1
hs = boxes[:, 3] - boxes[:, 1] + 1
keep = np.where((ws >= min_size) & (hs >= min_size))[0]
return keep
- 最后,做一遍nms,保留分数最大的N个框:
# 6. apply nms (e.g. threshold = 0.7)
# 7. take after_nms_topN (e.g. 300)
# 8. return the top proposals (-> RoIs top)
keep = nms(np.hstack((proposals, scores)), nms_thresh)
if post_nms_topN > 0:
keep = keep[:post_nms_topN]
proposals = proposals[keep, :]
scores = scores[keep]
# Output rois blob
# Our RPN implementation only supports a single input image, so all
# batch inds are 0
batch_inds = np.zeros((proposals.shape[0], 1), dtype=np.float32)
blob = np.hstack((batch_inds, proposals.astype(np.float32, copy=False)))
top[0].reshape(*(blob.shape))
top[0].data[...] = blob
# [Optional] output scores blob
if len(top) > 1:
top[1].reshape(*(scores.shape))
top[1].data[...] = scores
nms python部分代码:
其实现的思想主要是将各个框的置信度进行排序,然后选择其中置信度最高的框A,将其作为标准选择其他框,同时设置一个阈值,当其他框B与A的重合程度超过阈值就将B舍弃掉,然后在剩余的框中选择置信度最大的框,重复上述操作。
作者在代码中为了速度,使用了用C 写的nms ;
但是,作者给出了python 版本的nms baseline ,方便理解:
import numpy as np
def py_cpu_nms(dets, thresh):
"""Pure Python NMS baseline."""
x1 = dets[:, 0]
y1 = dets[:, 1]
x2 = dets[:, 2]
y2 = dets[:, 3]
scores = dets[:, 4]
areas = (x2 - x1 + 1) * (y2 - y1 + 1)
order = scores.argsort()[::-1]
keep = []
while order.size > 0:
i = order[0]
keep.append(i)
xx1 = np.maximum(x1[i], x1[order[1:]])
yy1 = np.maximum(y1[i], y1[order[1:]])
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])
w = np.maximum(0.0, xx2 - xx1 + 1)
h = np.maximum(0.0, yy2 - yy1 + 1)
inter = w * h
ovr = inter / (areas[i] + areas[order[1:]] - inter)
inds = np.where(ovr <= thresh)[0]
order = order[inds + 1]
return keep
完整版本forward代码:
def forward(self, bottom, top):
# Algorithm:
#
# for each (H, W) location i
# generate A anchor boxes centered on cell i
# apply predicted bbox deltas at cell i to each of the A anchors
# clip predicted boxes to image
# remove predicted boxes with either height or width < threshold
# sort all (proposal, score) pairs by score from highest to lowest
# take top pre_nms_topN proposals before NMS
# apply NMS with threshold 0.7 to remaining proposals
# take after_nms_topN proposals after NMS
# return the top proposals (-> RoIs top, scores top)
assert bottom[0].data.shape[0] == 1, \
'Only single item batches are supported'
cfg_key = str(self.phase) # either 'TRAIN' or 'TEST'
pre_nms_topN = cfg[cfg_key].RPN_PRE_NMS_TOP_N
post_nms_topN = cfg[cfg_key].RPN_POST_NMS_TOP_N
nms_thresh = cfg[cfg_key].RPN_NMS_THRESH
min_size = cfg[cfg_key].RPN_MIN_SIZE
# the first set of _num_anchors channels are bg probs
# the second set are the fg probs, which we want
#取softmax 分类后的各个框的分数
scores = bottom[0].data[:, self._num_anchors:, :, :]
bbox_deltas = bottom[1].data
im_info = bottom[2].data[0, :]
if DEBUG:
print 'im_size: ({}, {})'.format(im_info[0], im_info[1])
print 'scale: {}'.format(im_info[2])
# 1. Generate proposals from bbox deltas and shifted anchors
height, width = scores.shape[-2:]
if DEBUG:
print 'score map size: {}'.format(scores.shape)
# Enumerate all shifts
#再次生产anchors
shift_x = np.arange(0, width) * self._feat_stride
shift_y = np.arange(0, height) * self._feat_stride
shift_x, shift_y = np.meshgrid(shift_x, shift_y)
shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
shift_x.ravel(), shift_y.ravel())).transpose()
# Enumerate all shifted anchors:
#
# add A anchors (1, A, 4) to
# cell K shifts (K, 1, 4) to get
# shift anchors (K, A, 4)
# reshape to (K*A, 4) shifted anchors
A = self._num_anchors
K = shifts.shape[0]
anchors = self._anchors.reshape((1, A, 4)) + \
shifts.reshape((1, K, 4)).transpose((1, 0, 2))
anchors = anchors.reshape((K * A, 4))
# Transpose and reshape predicted bbox transformations to get them
# into the same order as the anchors:
#
# bbox deltas will be (1, 4 * A, H, W) format
# transpose to (1, H, W, 4 * A)
# reshape to (1 * H * W * A, 4) where rows are ordered by (h, w, a)
# in slowest to fastest order
bbox_deltas = bbox_deltas.transpose((0, 2, 3, 1)).reshape((-1, 4))
# Same story for the scores:
#
# scores are (1, A, H, W) format
# transpose to (1, H, W, A)
# reshape to (1 * H * W * A, 1) where rows are ordered by (h, w, a)
#
scores = scores.transpose((0, 2, 3, 1)).reshape((-1, 1))
# Convert anchors into proposals via bbox transformations
#利用 anchor 做 bounding box regression位置回归
proposals = bbox_transform_inv(anchors, bbox_deltas)
# 2. clip predicted boxes to image
#取
proposals = clip_boxes(proposals, im_info[:2])
# 3. remove predicted boxes with either height or width < threshold
# (NOTE: convert min_size to input image scale stored in im_info[2])
keep = _filter_boxes(proposals, min_size * im_info[2])
proposals = proposals[keep, :]
scores = scores[keep]
# 4. sort all (proposal, score) pairs by score from highest to lowest
# 5. take top pre_nms_topN (e.g. 6000)
order = scores.ravel().argsort()[::-1]
if pre_nms_topN > 0:
order = order[:pre_nms_topN]
proposals = proposals[order, :]
scores = scores[order]
# 6. apply nms (e.g. threshold = 0.7)
# 7. take after_nms_topN (e.g. 300)
# 8. return the top proposals (-> RoIs top)
keep = nms(np.hstack((proposals, scores)), nms_thresh)
if post_nms_topN > 0:
keep = keep[:post_nms_topN]
proposals = proposals[keep, :]
scores = scores[keep]
# Output rois blob
# Our RPN implementation only supports a single input image, so all
# batch inds are 0
batch_inds = np.zeros((proposals.shape[0], 1), dtype=np.float32)
blob = np.hstack((batch_inds, proposals.astype(np.float32, copy=False)))
top[0].reshape(*(blob.shape))
top[0].data[...] = blob
# [Optional] output scores blob
if len(top) > 1:
top[1].reshape(*(scores.shape))
top[1].data[...] = scores
总结:
proposal layer forward 层依次进行的操作为:
- 再次生产anchor,并对所有的anchor做了一边bbox reg 位置回归操作;
- 按照输入的foreground softmax scores由大到小排序anchors,提取前pre_nms_topN(e.g. 6000)个anchors。即提取修正位置后的foreground anchors
- 利用feat_stride和im_info将anchors映射回原图,判断fg anchors是否大范围超过边界,剔除严重超出边界fg anchors。
- 进行nms(nonmaximum suppression,非极大值抑制)
- 再次按照nms后的foreground softmax scores由大到小排序fg anchors,提取前post_nms_topN(e.g. 300)结果作为proposal输出