faster rcnn 中的 region proposal函数(proposal_layer、anchor_target_layer、proposal_target_layer)

代码地址:https://github.com/endernewton/tf-faster-rcnn 

1、首先,用初始化卷积核(9*2和9*4个)对vgg16生成的feature map(512通道)做卷积,得到没有实际意义[1]的rpn_cls_score和rpn_bbox_pred,之后再根据相应的loss反向传播,更新卷积核。rpn_cls_score是判断框是前景/背景,rpn_bbox_pred是预测bounding box和ground truth之间的偏移量delta。利用softmax函数把rpn_cls_score归一化,得到框属于前景/背景的概率,使前景/背景的概率总和等于1。feature map上面每个特征点有9个框,每个框有相应的背景/前景概率,所以rpn_cls_prob的shape是(1,?,?,18)。rpn_cls_pred通过argmax比较前景和背景的概率,判断框属于前景/背景。

RPN和RCNN共享这部分参数。

# 基础CNN网络(VGG16,ZF等)的参数使用ImageNet预训练,其他layer的参数使用期望为0、标准差为0.01的高斯分布初始化[6]
  def _region_proposal(self, net_conv, is_training, initializer):
    rpn = slim.conv2d(net_conv, cfg.RPN_CHANNELS, [3, 3], trainable=is_training, weights_initializer=initializer,
                        scope="rpn_conv/3x3")
    self._act_summaries.append(rpn)
    rpn_cls_score = slim.conv2d(rpn, self._num_anchors * 2, [1, 1], trainable=is_training,
                                weights_initializer=initializer,
                                padding='VALID', activation_fn=None, scope='rpn_cls_score')
    # change it so that the score has 2 as its channel size
    rpn_cls_score_reshape = self._reshape_layer(rpn_cls_score, 2, 'rpn_cls_score_reshape')
    rpn_cls_prob_reshape = self._softmax_layer(rpn_cls_score_reshape, "rpn_cls_prob_reshape")
    # 判断框属于前景/背景
    rpn_cls_pred = tf.argmax(tf.reshape(rpn_cls_score_reshape, [-1, 2]), axis=1, name="rpn_cls_pred")
    rpn_cls_prob = self._reshape_layer(rpn_cls_prob_reshape, self._num_anchors * 2, "rpn_cls_prob")
    # proposal_layer中,计算bbox_transform_inv_tf时,偏移量delta
    rpn_bbox_pred = slim.conv2d(rpn, self._num_anchors * 4, [1, 1], trainable=is_training,
                                weights_initializer=initializer,
                                padding='VALID', activation_fn=None, scope='rpn_bbox_pred')

2、进入proposal_layer,把前景框对应的概率取出来(后面9个),

# Get the scores and bounding boxes 后面9个是前景框
  scores = rpn_cls_prob[:, :, :, num_anchors:]

通过bbox_transform_inv函数,对anchor做一些微调(中心点平移和宽高放缩),clip_boxes保证得到的anchor都在图片内。

# 微调anchor,得到proposal
  proposals = bbox_transform_inv_tf(anchors, rpn_bbox_pred)
  proposals = clip_boxes_tf(proposals, im_info[:2])

然后利用非极大值抑制(non_max_suppression)[2],按照score(没有实际意义的rpn_cls_score)从高到低选择post_nms_topN个框,就是把与最高score对应的框的重合度大于nms_thresh的框全部删掉,只留下score最高的框。

# 一共600*1000/(16*16)个特征点,在每个特征点上产生9个anchors,大概有20000个anchor,用nms选出2000个
  # Non-maximal suppression 
  indices = tf.image.non_max_suppression(proposals, scores, max_output_size=post_nms_topN, iou_threshold=nms_thresh)

人骑马那张图,人和马不是同一个特征,他们的框的IOU小于0.3;如果两个不同特征的框IOU大于0.7,则删掉。返回这些框(roi, region of interest,2000个)和对应的score。

3、进入anchor_target_layer,只考虑完全在图像内的框。之后利用bbox_overlaps在anchors和gt_boxes之间,两两比较计算,生成一个[N,K]矩阵overlaps表示它们的重叠度,再分别得到anchor和gt_boxes在相应维度最大值处的索引。

  # overlaps between the anchors and the gt boxes
  # overlaps (ex, gt)  anchors和gt_boxes之间,两两比较计算,生成一个[N,K]矩阵overlaps,大概20000个anchor和20个类
  overlaps = bbox_overlaps(
    np.ascontiguousarray(anchors, dtype=np.float),
    np.ascontiguousarray(gt_boxes, dtype=np.float))
  argmax_overlaps = overlaps.argmax(axis=1) #得到anchor与ground truth相交面积最大值的索引
  max_overlaps = overlaps[np.arange(len(inds_inside)), argmax_overlaps] #得到anchor与所有ground truth的最大相交面积

把重叠度的最大值和大于RPN_POSITIVE_OVERLAP(0.7)的框的rpn_label设置成1(前景),把重叠度小于RPN_NEGATIVE_OVERLAP(0.3)的框的rpn_label设置成0(背景),不管框内物体的类别计算交叉熵时让这些rpn_labels和proposal_layer产生的score相乘[3]。

# 计算交叉熵时会使用这些labels
  # fg label: above threshold IOU
  labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1

  if cfg.TRAIN.RPN_CLOBBER_POSITIVES:
    # assign bg labels last so that negative labels can clobber positives
    labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0
      rpn_cross_entropy = tf.reduce_mean(
        tf.nn.sparse_softmax_cross_entropy_with_logits(logits=rpn_cls_score, labels=rpn_label))

有128个前景 

num_fg = int(cfg.TRAIN.RPN_FG_FRACTION * cfg.TRAIN.RPN_BATCHSIZE) #0.5*256=128

根据与anchor交集最大的gt_boxes,即anchor所属的类,用bbox_transform计算anchor平移和放缩的幅度rpn_bbox_targets,计算这些targets与rpn_bbox_pred的smooth_L1_loss。

# 计算anchor平移和放缩的幅度target,和所属的类比较
  bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :])
rpn_loss_box = self._smooth_l1_loss(rpn_bbox_pred, rpn_bbox_targets, rpn_bbox_inside_weights,
                                          rpn_bbox_outside_weights, sigma=sigma_rpn, dim=[1, 2, 3])

bbox_inside_weights表示只对前景框做回归,bbox_outside_weights是平衡损失函数的权重。

4、在上一步更新完rpn_label之后,进入proposal_target_layer,从2000个rois中选出128个rois

  rois_per_image = cfg.TRAIN.BATCH_SIZE / num_images  # 每张图片有128个rois
  fg_rois_per_image = np.round(cfg.TRAIN.FG_FRACTION * rois_per_image)  #每张图片有32个前景

sample_rois是生成一些由前景和背景组成的框roi,假设有3个ground truth,那么gt_boxes的shape应该是(3,5),关注框的类别 0,1,2 ...等等[5]

  labels = gt_boxes[gt_assignment, 4] # 找到rois属于哪一类

  # Select foreground RoIs as those with >= FG_THRESH overlap
  fg_inds = np.where(max_overlaps >= cfg.TRAIN.FG_THRESH)[0] #前景重叠比例大于0.5
  # Guard against the case when an image has fewer than fg_rois_per_image
  # Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
  bg_inds = np.where((max_overlaps < cfg.TRAIN.BG_THRESH_HI) &
                     (max_overlaps >= cfg.TRAIN.BG_THRESH_LO))[0] # 0.1<背景重叠<0.5

get_bbox_regression_labels只对前景框做回归。

# 对属于前景的bounding box做回归
def _get_bbox_regression_labels(bbox_target_data, num_classes):
  """Bounding-box regression targets (bbox_target_data) are stored in a
  compact form N x (class, tx, ty, tw, th)

  This function expands those targets into the 4-of-4*K representation used
  by the network (i.e. only one class has non-zero targets). one-hot编码[4]

  Returns:
      bbox_target (ndarray): N x 4K blob of regression targets
      bbox_inside_weights (ndarray): N x 4K blob of loss weights
  """

  clss = bbox_target_data[:, 0] 
  bbox_targets = np.zeros((clss.size, 4 * num_classes), dtype=np.float32)
  bbox_inside_weights = np.zeros(bbox_targets.shape, dtype=np.float32)
  inds = np.where(clss > 0)[0] #背景label全是0
  for ind in inds:
    cls = clss[ind]
    start = int(4 * cls)
    end = start + 4
    bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
    bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
  return bbox_targets, bbox_inside_weights

 

在Pycharm里面,用Ctrl+Shift+F能快速查找函数返回值在哪里会被用到。Ctrl+鼠标左键,直接跳转到函数。

参考:

[1]. https://blog.csdn.net/weixin_40489988/article/details/106181969

[2]. https://www.cnblogs.com/makefile/p/nms.html

[3]. https://blog.csdn.net/ZJRN1027/article/details/80199248

[4]. https://www.cnblogs.com/shuaishuaidefeizhu/p/11269257.html

[5]. https://blog.csdn.net/Mr_health/article/details/84952190

[6]. https://zhuanlan.zhihu.com/p/54443471

你可能感兴趣的:(深度学习,深度学习,tensorflow)