在笔者之前的解析RPN和ROI-Pooling的博客中,已经给大家详细解析了目标检测Faster R-CNN框架中的两大核心部件。纵观整个Faster R-CNN代码,比较难和经典的部分除了上述两大模块,还有根据RPN输出的前景分数选择出roi和为选择出的roi置ground truth类别和坐标变换的代码。在本篇博客中,笔者就这两部分代码为大家做出解析。
首先是如何选择出合适的rois,该代码文件是proposal_layer.py;其次是如何为选择出的rois找到训练所需的ground truth类别和坐标变换信息,该代码文件是proposal_target_layer.py。在正式开始之前,还是按照惯例做出说明:
1. 笔者解析的代码是tensorflow下实现的Faster R-CNN,工程链接https://github.com/kevinjliang/tf-Faster-RCNN,代码文件路径分别是Networks/proposal_layer.py和Networks/proposal_target_layer.py。不过,请大家不用担心,这两个文件也是基于原作,和Ross Girshick的py-faster-rcnn中的代码几乎一致。
2. 请大家在看代码解析之前完全明了Faster R-CNN的工作原理,尤其是在RPN输出结果后,如何选择proposal的部分,有以下几个途径:
1) 直接进行Faster R-CNN论文阅读,选择proposal的部分主要集中在论文3.3节和实验部分:https://arxiv.org/abs/1506.01497
2) 可以参阅笔者的blog:实例分割模型Mask R-CNN详解:从R-CNN,Fast R-CNN,Faster R-CNN再到Mask R-CNN
3) 可以看一篇知乎专栏:一文读懂Faster R-CNN
3. 笔者在解析代码的过程中做到尽量详实,如果觉得代码解析有问题或者存在疏漏的读者朋友,欢迎在评论区指出讨论,笔者不胜感激。
下面开始干货:
首先,笔者先解析一下proposal_layer.py,完成的功能是根据RPN的输出结果,提取出所需的目标框(roi)。按照惯例,笔者先放出代码解析:
# -*- coding: utf-8 -*-
"""
Created on Mon Jan 2 19:25:41 2017
@author: Kevin Liang (modifications)
Proposal Layer: Applies the Region Proposal Network's (RPN) predicted deltas to
each of the anchors, removes unsuitable boxes, and then ranks them by their
"objectness" scores. Non-maximimum suppression removes proposals of the same
object, and the top proposals are returned.
Adapted from the official Faster R-CNN repo:
https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/rpn/proposal_layer.py
"""
# --------------------------------------------------------
# Faster R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick and Sean Bell
# --------------------------------------------------------
import numpy as np
import tensorflow as tf
from Lib.bbox_transform import bbox_transform_inv, clip_boxes #bbox_transform_inv改变初始框的坐标,clip_boxes把超出图像边界的框限制在图像边界内
from Lib.faster_rcnn_config import cfg #配置文件
from Lib.generate_anchors import generate_anchors #生成初始框
from Lib.nms_wrapper import nms #去掉多余的重叠的框
#使用tf.py_func接口,方便进行numpy运算
def proposal_layer(rpn_bbox_cls_prob, rpn_bbox_pred, im_dims, cfg_key, _feat_stride, anchor_scales):
return tf.reshape(tf.py_func(_proposal_layer_py,[rpn_bbox_cls_prob, rpn_bbox_pred, im_dims[0], cfg_key, _feat_stride, anchor_scales], [tf.float32]),[-1,5])
def _proposal_layer_py(rpn_bbox_cls_prob, rpn_bbox_pred, im_dims, cfg_key, _feat_stride, anchor_scales):
'''
# Algorithm:
#
# for each (H, W) location i
# generate A anchor boxes centered on cell i
# apply predicted bbox deltas at cell i to each of the A anchors
# clip predicted boxes to image
# remove predicted boxes with either height or width < threshold
# sort all (proposal, score) pairs by score from highest to lowest
# take top pre_nms_topN proposals before NMS
# apply NMS with threshold 0.7 to remaining proposals
# take after_nms_topN proposals after NMS
# return the top proposals (-> RoIs top, scores top)
'''
_anchors = generate_anchors(scales=np.array(anchor_scales)) #生成9个锚点,shape: [9,4]
_num_anchors = _anchors.shape[0] #_num_anchors值为9
rpn_bbox_cls_prob = np.transpose(rpn_bbox_cls_prob,[0,3,1,2]) #将RPN输出的分类信息维度变成[N,C,H,W]
rpn_bbox_pred = np.transpose(rpn_bbox_pred,[0,3,1,2]) #将RPN输出的边框变换信息维度变成[N,C,H,W]
# Only minibatch of 1 supported 核验一下batch_size必须等于1
assert rpn_bbox_cls_prob.shape[0] == 1, \
'Only single item batches are supported'
if cfg_key == 'TRAIN': #如果是在训练的话
pre_nms_topN = cfg.TRAIN.RPN_PRE_NMS_TOP_N #12000
post_nms_topN = cfg.TRAIN.RPN_POST_NMS_TOP_N #2000
nms_thresh = cfg.TRAIN.RPN_NMS_THRESH #0.7
min_size = cfg.TRAIN.RPN_MIN_SIZE #16
else: # cfg_key == 'TEST': 如果是在测试的话
pre_nms_topN = cfg.TEST.RPN_PRE_NMS_TOP_N #6000
post_nms_topN = cfg.TEST.RPN_POST_NMS_TOP_N #300
nms_thresh = cfg.TEST.RPN_NMS_THRESH #0.7
min_size = cfg.TEST.RPN_MIN_SIZE #16
# the first set of _num_anchors channels are bg probs
# the second set are the fg probs, which we want
#按照通道C取出RPN预测的框属于前景的分数,请注意,在18个channel中,前9个是框属于背景的概率,后9个才是属于前景的概率
scores = rpn_bbox_cls_prob[:, _num_anchors:, :, :]
#bbox_deltas代表了RPN输出的各个框的坐标变换信息
bbox_deltas = rpn_bbox_pred
# 1. Generate proposals from bbox deltas and shifted anchors
height, width = scores.shape[-2:] #在这里得到了rpn输出的H和W,
# Enumerate all shifts
shift_x = np.arange(0, width) * _feat_stride #shape: [width,]
shift_y = np.arange(0, height) * _feat_stride #shape: [height,]
shift_x, shift_y = np.meshgrid(shift_x, shift_y) #生成网格 shift_x shape: [height, width], shift_y shape: [height, width]
shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
shift_x.ravel(), shift_y.ravel())).transpose() # shape[height*width, 4]
# Enumerate all shifted anchors:
#
# add A anchors (1, A, 4) to
# cell K shifts (K, 1, 4) to get
# shift anchors (K, A, 4)
# reshape to (K*A, 4) shifted anchors
A = _num_anchors # A = 9
K = shifts.shape[0] # K=height*width(特征图上的)
anchors = _anchors.reshape((1, A, 4)) + \
shifts.reshape((1, K, 4)).transpose((1, 0, 2)) #shape[K,A,4] 得到所有的初始框
anchors = anchors.reshape((K * A, 4)) #把初始框的数组维度改变一下,变成[K×A,4]
# Transpose and reshape predicted bbox transformations to get them
# into the same order as the anchors:
#
# bbox deltas will be (1, 4 * A, H, W) format
# transpose to (1, H, W, 4 * A)
# reshape to (1 * H * W * A, 4) where rows are ordered by (h, w, a)
# in slowest to fastest order
#将RPN输出的边框变换信息维度变回[N,H,W,C],再改变一下维度,变成[1×H×W×A,4]
bbox_deltas = bbox_deltas.transpose((0, 2, 3, 1)).reshape((-1, 4))
# Same story for the scores:
#
# scores are (1, A, H, W) format
# transpose to (1, H, W, A)
# reshape to (1 * H * W * A, 1) where rows are ordered by (h, w, a)
#将RPN输出的分类信息维度变回[N,H,W,C],再改变一下维度,变成[1×H×W×A,1]
scores = scores.transpose((0, 2, 3, 1)).reshape((-1, 1))
# Convert anchors into proposals via bbox transformations
#在这里结合RPN的输出变换初始框的坐标,得到第一次变换坐标后的proposals
proposals = bbox_transform_inv(anchors, bbox_deltas)
# 2. clip predicted boxes to image
#在这里讲超出图像边界的proposal进行边界裁剪,使之在图像边界之内
proposals = clip_boxes(proposals, im_dims)
# 3. remove predicted boxes with either height or width < threshold
#排除掉长或者宽太小的框,keep下标指的是需要保留的长宽合适的框的索引
keep = _filter_boxes(proposals, min_size)
proposals = proposals[keep, :]
scores = scores[keep]
# 4. sort all (proposal, score) pairs by score from highest to lowest
# 5. take top pre_nms_topN (e.g. 6000)
#对框按照前景分数进行排序,order中指示了框的下标
order = scores.ravel().argsort()[::-1]
if pre_nms_topN > 0:
order = order[:pre_nms_topN] #选择前景分数排名在前pre_nms_topN(训练时为12000,测试时为6000)的框
proposals = proposals[order, :] #保留了前pre_nms_topN个框的坐标信息
scores = scores[order] #保留了前pre_nms_topN个框的分数信息
# 6. apply nms (e.g. threshold = 0.7)
# 7. take after_nms_topN (e.g. 300)
# 8. return the top proposals (-> RoIs top)
#使用nms算法排除重复的框
keep = nms(np.hstack((proposals, scores)), nms_thresh)
if post_nms_topN > 0:
keep = keep[:post_nms_topN] #选择前景分数排名在前post_nms_topN(训练时为2000,测试时为300)的框
proposals = proposals[keep, :] #保留了前post_nms_topN个框的坐标信息
scores = scores[keep] #保留了前post_nms_topN个框的分数信息
# Output rois blob
# Our RPN implementation only supports a single input image, so all
# batch inds are 0
#因为要进行roi_pooling,在保留框的坐标信息前面插入batch中图片的编号信息。此时,由于batch_size为1,因此都插入0
batch_inds = np.zeros((proposals.shape[0], 1), dtype=np.float32)
blob = np.hstack((batch_inds, proposals.astype(np.float32, copy=False)))
return blob
def _filter_boxes(boxes, min_size): #_filter_boxes函数过滤掉proposals中边框长宽太小的框
"""Remove all boxes with any side smaller than min_size."""
ws = boxes[:, 2] - boxes[:, 0] + 1 #得到所有框的宽
hs = boxes[:, 3] - boxes[:, 1] + 1 #得到所有框的长
keep = np.where((ws >= min_size) & (hs >= min_size))[0] #返回满足长宽均在阈值之上的框的下标
return keep
我们来梳理一下proposal_layer的思路:
1) 由于proposal_layer是训练和测试时都需要执行的,只是说在训练和测试时选择的roi的个数不一致,因此在代码的开头部分进行了相应的赋值。
2) 得到了所有的从未经过坐标变换的初始框,存在anchors中。
3) 由bbox_transform_inv函数结合RPN的输出对所有初始框进行了坐标变换。bbox_transform_inv函数如下所示:
def bbox_transform_inv(boxes, deltas):
'''
Applies deltas to box coordinates to obtain new boxes, as described by
deltas
'''
if boxes.shape[0] == 0:
return np.zeros((0, deltas.shape[1]), dtype=deltas.dtype)
boxes = boxes.astype(deltas.dtype, copy=False)
#获得初始proposal的中心和长宽信息
widths = boxes[:, 2] - boxes[:, 0] + 1.0
heights = boxes[:, 3] - boxes[:, 1] + 1.0
ctr_x = boxes[:, 0] + 0.5 * widths
ctr_y = boxes[:, 1] + 0.5 * heights
#获得坐标变换信息
dx = deltas[:, 0::4]
dy = deltas[:, 1::4]
dw = deltas[:, 2::4]
dh = deltas[:, 3::4]
#得到改变后的proposal的中心和长宽信息
pred_ctr_x = dx * widths[:, np.newaxis] + ctr_x[:, np.newaxis]
pred_ctr_y = dy * heights[:, np.newaxis] + ctr_y[:, np.newaxis]
pred_w = np.exp(dw) * widths[:, np.newaxis]
pred_h = np.exp(dh) * heights[:, np.newaxis]
#将改变后的proposal的中心和长宽信息还原成左上角和右下角的版本
pred_boxes = np.zeros(deltas.shape, dtype=deltas.dtype)
# x1
pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w
# y1
pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
# x2
pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w
# y2
pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h
return pred_boxes
4) 使用clip_boxes函数将改变坐标信息后超过图像边界的框的边框裁剪一下,使之在图像边界之内。clip_boxes函数如下所示:
def clip_boxes(boxes, im_shape):
"""
Clip boxes to image boundaries.
"""
#严格限制proposal的四个角在图像边界内
# x1 >= 0
boxes[:, 0::4] = np.maximum(np.minimum(boxes[:, 0::4], im_shape[1] - 1), 0)
# y1 >= 0
boxes[:, 1::4] = np.maximum(np.minimum(boxes[:, 1::4], im_shape[0] - 1), 0)
# x2 < im_shape[1]
boxes[:, 2::4] = np.maximum(np.minimum(boxes[:, 2::4], im_shape[1] - 1), 0)
# y2 < im_shape[0]
boxes[:, 3::4] = np.maximum(np.minimum(boxes[:, 3::4], im_shape[0] - 1), 0)
return boxes
5) 用_filter_boxes函数排除掉了长宽过小的框,_filter_boxes函数见上面的proposal_layer代码解析最下方。
6) 对所有的框按照前景分数进行排序,选择排序后的前pre_nms_topN和框。
7) 对于上一步选择出来的框,用nms算法根据阈值排除掉重叠的框。
8) 对于剩下的框,选择post_nms_topN个最终的框。
9) 在所有选出的框,即roi的前面插入在训练batch中的索引,由于batch size为1,因此都插入0。
梳理了根据RPN输出的分数选择框(roi)的操作,我们来看一下该代码中有哪些值得注意的地方。就笔者认为,proposal_layer中只有一个地方需要注意,就是:
#按照通道C取出RPN预测的框属于前景的分数,请注意,在18个channel中,前9个是框属于背景的概率,后9个才是属于前景的概率
scores = rpn_bbox_cls_prob[:, _num_anchors:, :, :]
请大家注意,在选择RPN输出的前景分数的时候,是选择输出的18个通道中的后9个。在这里,笔者要提醒大家,对于RPN输出的判断分类(前后景)的分支,是输出18个通道的(9×2)。这18个数表示了9个初始框的各自的前景分数和背景分数,而这18个值的排序,是下图所示的:
按照上图所示排列,才能取出后9个值,表示选出9个框的前景分类分数。笔者在这里同时提醒大家注意,这18个值不是按照下图所示的:
在解析完了proposal_layer.py文件之后,我们来看一看在训练的时候如何为选出的框(roi)置ground truth类别和坐标变换信息。先放出proposal_target_layer.py文件的解析:
# -*- coding: utf-8 -*-
"""
Created on Tue Jan 3 22:30:23 2017
@author: Kevin Liang (modifications)
Adapted from the official Faster R-CNN repo:
https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/rpn/proposal_target_layer.py
"""
# --------------------------------------------------------
# Faster R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick and Sean Bell
# --------------------------------------------------------
import numpy as np
import numpy.random as npr
import tensorflow as tf
from Lib.bbox_overlaps import bbox_overlaps #计算框与框的重合度
from Lib.bbox_transform import bbox_transform #计算基础框与目标框之间的位置映射
from Lib.faster_rcnn_config import cfg #配置文件
#接收三个参数:按照前景分数选择出来的待进行分类的框,ground truth框,分类类别数目
#假设按照前景分数选择出来的待进行分类的框个数为N,ground truth框个数为M
def proposal_target_layer(rpn_rois, gt_boxes,_num_classes):
'''
Make Python version of _proposal_target_layer_py below Tensorflow compatible
'''
rois,labels,bbox_targets,bbox_inside_weights,bbox_outside_weights = tf.py_func(_proposal_target_layer_py,[rpn_rois, gt_boxes,_num_classes],[tf.float32,tf.int32,tf.float32,tf.float32,tf.float32])
rois = tf.reshape(rois,[-1,5] , name = 'rois') #将rois转化为tensor,维度变成[-1,5]
labels = tf.convert_to_tensor(tf.cast(labels,tf.int32), name = 'labels') #将类别标签转化为tensor,并且类型变成tf.int32
bbox_targets = tf.convert_to_tensor(bbox_targets, name = 'bbox_targets') #将坐标变换标签转化为tensor
bbox_inside_weights = tf.convert_to_tensor(bbox_inside_weights, name = 'bbox_inside_weights') #将bbox_inside_weights转化为tensor
bbox_outside_weights = tf.convert_to_tensor(bbox_outside_weights, name = 'bbox_outside_weights') #将bbox_outside_weights转化为tensor
return rois, labels, bbox_targets, bbox_inside_weights, bbox_outside_weights
#_proposal_target_layer_py函数返回主要结果
def _proposal_target_layer_py(rpn_rois, gt_boxes,_num_classes):
"""
Assign object detection proposals to ground-truth targets. Produces proposal
classification labels and bounding-box regression targets.
"""
# Proposal ROIs (0, x1, y1, x2, y2) coming from RPN
# (i.e., rpn.proposal_layer.ProposalLayer), or any other source
all_rois = rpn_rois #all_rois表示选择出来的N个待进行分类的框的坐标信息 shape [N,5]
# Include ground-truth boxes in the set of candidate rois
#将ground truth框加入到待分类的框里面(相当于增加正样本个数)
zeros = np.zeros((gt_boxes.shape[0], 1), dtype=gt_boxes.dtype)
#all_rois输出维度(N+M,5),前一维表示是从RPN的输出选出的框和ground truth框合在一起了
all_rois = np.vstack(
(all_rois, np.hstack((zeros, gt_boxes[:, :-1])))
)#先在每个ground truth框前面插入0(这样才能和N个从RPN的输出选出的框对齐),然后把ground truth框插在最后
# Sanity check: single batch only 确认一下batch size为1
assert np.all(all_rois[:, 0] == 0), \
'Only single item batches are supported'
num_images = 1
rois_per_image = cfg.TRAIN.BATCH_SIZE // num_images #cfg.TRAIN.BATCH_SIZE为128
#cfg.TRAIN.FG_FRACTION为0.25,即在一次分类训练中前景框只能有32个
fg_rois_per_image = np.round(cfg.TRAIN.FG_FRACTION * rois_per_image).astype(np.int32)
# Sample rois with classification labels and bounding box regression
# targets
#_sample_rois选择进行分类训练的框,并求取他们类别和坐标的ground truth和计算边框损失loss时需要的bbox_inside_weights
labels, rois, bbox_targets, bbox_inside_weights = _sample_rois(
all_rois, gt_boxes, fg_rois_per_image,
rois_per_image, _num_classes)
rois = rois.reshape(-1,5) #将返回的rois的维度变成[-1,5]
labels = labels.reshape(-1,1) #将返回的rois的ground truth类别的维度变成[-1,5]
bbox_targets = bbox_targets.reshape(-1,_num_classes*4) #将返回的rois的ground truth坐标的维度变成[-1,_num_classes*4]
bbox_inside_weights = bbox_inside_weights.reshape(-1,_num_classes*4) #将返回的bbox_inside_weights维度变成[-1,_num_classes*4]
#置bbox_outside_weights,shape [-1,_num_classes*4]。其中,bbox_inside_weights大于0的位置为1,其余为0
bbox_outside_weights = np.array(bbox_inside_weights > 0).astype(np.float32)
return np.float32(rois),labels,bbox_targets,bbox_inside_weights,bbox_outside_weights #返回各个值
def _get_bbox_regression_labels(bbox_target_data, num_classes): #求得最终计算loss时使用的ground truth边框回归值和bbox_inside_weights
"""Bounding-box regression targets (bbox_target_data) are stored in a
compact form N x (class, tx, ty, tw, th)
This function expands those targets into the 4-of-4*K representation used
by the network (i.e. only one class has non-zero targets).
Returns:
bbox_target (ndarray): N x 4K blob of regression targets
bbox_inside_weights (ndarray): N x 4K blob of loss weights
"""
clss = bbox_target_data[:, 0] #在这里先得到用来训练的每个roi的类别
bbox_targets = np.zeros((clss.size, 4 * num_classes), dtype=np.float32) #用全0初始化一下边框回归的ground truth值。针对每个roi,对每个类别都置4个坐标回归值
bbox_inside_weights = np.zeros(bbox_targets.shape, dtype=np.float32) #用全0初始化一下bbox_inside_weights
inds = np.where(clss > 0)[0] #找到属于前景的rois
for ind in inds: #针对每一个前景roi:
cls = clss[ind] #找到从属的类别
start = int(4 * cls) #找到从属的类别对应的坐标回归值的起始位置
end = start + 4 #找到从属的类别对应的坐标回归值的结束位置
bbox_targets[ind, start:end] = bbox_target_data[ind, 1:] #在对应类的坐标回归上置相应的值
bbox_inside_weights[ind, start:end] = (1, 1, 1, 1) #将bbox_inside_weights上的对应类的坐标回归值置1
return bbox_targets, bbox_inside_weights
def _compute_targets(ex_rois, gt_rois, labels):
"""Compute bounding-box regression targets for an image."""
assert ex_rois.shape[0] == gt_rois.shape[0] #确保roi的数目和对应的ground truth框的数目相等
assert ex_rois.shape[1] == 4 #确保roi的坐标信息传入的是4个
assert gt_rois.shape[1] == 4 #确保ground truth框的坐标信息传入的是4个
targets = bbox_transform(ex_rois, gt_rois) #为rois找到坐标变换值
if cfg.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED: #如果需要将框的坐标归一化,就执行if中的代码
# Optionally normalize targets by a precomputed mean and stdev
targets = ((targets - np.array(cfg.TRAIN.BBOX_NORMALIZE_MEANS))
/ np.array(cfg.TRAIN.BBOX_NORMALIZE_STDS))
return np.hstack(
(labels[:, np.newaxis], targets)).astype(np.float32, copy=False) #将roi对应的类别插在前面
def _sample_rois(all_rois, gt_boxes, fg_rois_per_image, rois_per_image, num_classes):
"""Generate a random sample of RoIs comprising foreground and background
examples.
"""
# overlaps: (rois x gt_boxes)
#计算所有roi和ground truth框之间的重合度
#只取坐标信息,roi中取第二到第五个数,ground truth框中取第一到第四个数
overlaps = bbox_overlaps(
np.ascontiguousarray(all_rois[:, 1:5], dtype=np.float),
np.ascontiguousarray(gt_boxes[:, :4], dtype=np.float))
gt_assignment = overlaps.argmax(axis=1) #对于每个roi,找到对应的gt_box坐标 shape: [len(all_rois),]
max_overlaps = overlaps.max(axis=1) #对于每个roi,找到与gt_box重合的最大的overlap shape: [len(all_rois),]
labels = gt_boxes[gt_assignment, 4] #对于每个roi,找到归属的类别: [len(all_rois),]
# Select foreground RoIs as those with >= FG_THRESH overlap
fg_inds = np.where(max_overlaps >= cfg.TRAIN.FG_THRESH)[0] #找到属于前景的rois(就是与gt_box覆盖超过0.5以上的)
# Guard against the case when an image has fewer than fg_rois_per_image
# foreground RoIs
fg_rois_per_this_image = min(fg_rois_per_image, fg_inds.size) #求得一个训练batch中前景的个数
# Sample foreground regions without replacement
if fg_inds.size > 0:
fg_inds = npr.choice(fg_inds, size=fg_rois_per_this_image, replace=False) #如果需要的话,就随机地排除一些前景框
# Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
bg_inds = np.where((max_overlaps < cfg.TRAIN.BG_THRESH_HI) &
(max_overlaps >= cfg.TRAIN.BG_THRESH_LO))[0] #找到属于背景的rois(就是与gt_box覆盖介于0和0.5之间的)
# Compute number of background RoIs to take from this image (guarding
# against there being fewer than desired)
bg_rois_per_this_image = rois_per_image - fg_rois_per_this_image #求得一个训练batch中的理论背景的个数
bg_rois_per_this_image = min(bg_rois_per_this_image, bg_inds.size) #求得一个训练batch中的事实背景的个数
# Sample background regions without replacement
if bg_inds.size > 0:
bg_inds = npr.choice(bg_inds, size=bg_rois_per_this_image, replace=False) #如果需要的话,就随机地排除一些背景框
# The indices that we're selecting (both fg and bg)
keep_inds = np.append(fg_inds, bg_inds) #记录一下最终保留的框
# Select sampled values from various arrays:
labels = labels[keep_inds] #记录一下最终保留的框对应的label
# Clamp labels for the background RoIs to 0
labels[fg_rois_per_this_image:] = 0 #把背景框的坐标置0
rois = all_rois[keep_inds] #取到最终保留的rois
bbox_target_data = _compute_targets(
rois[:, 1:5], gt_boxes[gt_assignment[keep_inds], :4], labels) #得到最终保留的框的类别ground truth值和坐标变换ground truth值
bbox_targets, bbox_inside_weights = \
_get_bbox_regression_labels(bbox_target_data, num_classes) #得到最终计算loss时使用的ground truth边框回归值和bbox_inside_weights
return labels, rois, bbox_targets, bbox_inside_weights
然后,我们来整理一下proposal_target_layer的思路:
1) 在_proposal_target_layer_py函数中,首先将ground truth框加入了根据RPN输出选择出的框,相当于增加前景的数量,此时,roi的数量变成了N(根据RPN的输出选出的)+M(ground truth框)。
2) 进入_sample_rois函数,首先计算所有的roi和ground truth框的重合度(IoU),然后对于每个roi,找到对应的ground truth框和正确的类别标签。
3) 为一个训练batch,在全部roi中选择前景框(前景框不能太多,最多只能占训练batch的1/4)和背景框。
4) 为进行该batch训练的框置分类标签,并通过_compute_targets函数计算坐标回归标签。
5) 通过_get_bbox_regression_labels函数将坐标回归标签扩充,变成训练所需的格式。
在整理了proposal_target_layer的思路之后,我们来看一下代码中有哪些需要注意的地方。笔者认为,代码中需要注意的地方一共有两个:
1) 正样本最多只占一个batch中最大图片数量的1/4。如果一个batch最大容量是128,那么,正样本最多就只有32个。
num_images = 1
rois_per_image = cfg.TRAIN.BATCH_SIZE // num_images #cfg.TRAIN.BATCH_SIZE为128
#cfg.TRAIN.FG_FRACTION为0.25,即在一次分类训练中前景框只能有32个
fg_rois_per_image = np.round(cfg.TRAIN.FG_FRACTION * rois_per_image).astype(np.int32)
正负样本还是通过IoU来判断的,如果与ground truth框重叠在0.5及以上,就是正样本;如果IoU介于0和0.5之间,就是负样本。
2) 在最终训练边框回归的时候,是针对每个类别,单独安排坐标回归标签。如果某一个roi属于a类,那么坐标回归标签就被安排在对应a类的位置上面。该功能由_get_bbox_regression_labels函数完成。为啥要这么做呢?因为对于每一个roi而言,Fast R-CNN的边框回归部分输出的是num_classes*4个通道。
# Bounding Box refinement
with tf.variable_scope('bbox'):
self.rcnn_bbox_layers = Layers(hidden)
self.rcnn_bbox_layers.fc(output_nodes=self.num_classes*4, activation_fn=None)
到这里,proposal_layer和proposal_target_layer的代码解析就已经接近尾声了。总的来说,两个文件的代码还是写得比较有技巧和高效的。这两个函数为RPN和Fast R-CNN之间建立了桥梁,同时也是Faster R-CNN中比较难的代码,笔者也衷心希望自己的解析能对大家有帮助。对于上述的代码分析,如果各位读者朋友们认为存在疏漏,欢迎在评论区指出与讨论,笔者不胜感激。
欢迎阅读笔者后续博客,各位读者朋友的支持与鼓励是我最大的动力!
written by jiong
老当益壮,宁移白首之心?
穷且益坚,不坠青云之志。