FPN源码

看FPN论文之后感觉没什么思路,感觉就需要看看源码了,没调试这里的代码,只是为了学习算法的流程。
github:月夜的代码
这博客的作者提到三份代码:
使用TensorFlow训练综合FPN和Faster-RCNN的目标检测模型

FPN模型简介

  • 目前几乎所有在 ImageNet 和 COCO 检测任务上取得好成绩的方法都使用了图像金字塔方法,如图 (a) 所示。然而这样的方法由于很高的时间及计算量消耗,难以在实际中应用。
  • 两阶段检测如Faster-rcnn系列,使用的是最后一个特征如图(b)所示。而对于卷积神经网络而言,不同深度对应着不同层次的语义特征,浅层网络分辨率高,学的更多是细节特征,深层网络分辨率低,学的更多是语义特征。结合多种层次的特征图是现在的一个趋势。
  • 特征分层,即每层分别预测对应的 scale 分辨率的检测结果,如图 (c) 所示。SSD 检测框架采用了类似的思想。这样的方法问题在于直接强行让不同层学习同样的语义信息。
FPN源码_第1张图片
  • 图中1(d) 所示,网络直接在原来的单网络上做修改,每个分辨率的 feature map 引入后一分辨率缩放两倍的 feature map 做 element-wise 相加的操作。通过这样的连接,每一层预测所用的 feature map 都融合了不同分辨率、不同语义强度的特征,融合的不同分辨率的 feature map 分别做对应分辨率大小的物体检测, 这就是FPN网络,就是今天所说的模型。

但我读的是作者改编之后的代码:

FPN源码_第2张图片

这里有几个文件需要做一下备忘录:

  • configs:下面是一些模型配置的超参数,这里有vgg,inception等。
  • data: 使用来做数据的工厂文件,这里的文件与数据生成有关。
  • help_utils:有两个文件,help_utils.py是show图片的一个重要文件。
  • scripts: 脚本文件,在Ubuntu下直接执行的.sh文件,调用tools文件进行train,test,eval,inferen。
  • tools:目标检测的几个阶段的主函数文件,训练,测试,评估,推断。接下来的讲解路线就是从这里开始。
  • gen_classes.py: 作者提到这是用来生成label的文件,可以得到txt,txt是制作标签时的第一步嘛。
  • cnvert_txt2xml.py: 从名字就可以看出来是把txt文件转化为xml的标注格式。

test.py

这个文件大体看起来分为几个部分。

  • 1.参数传入,数据读取
  • 2.构建基础网络,vgg,inception,提供每层的feature map。FPN不仅仅使用最后一层,中间的几层也会使用的。
    1. 构建RPN网络,对每层的feature map进行生成box和背景二分类
  • 4.构建Faster RCNN网络,为什么不叫FPN网络,不知道,但是这里确实构建了特征金字塔。
  • 5.获取需要的数值占位符,并进行计算。
  • 6.使用help_utils进行结果展示

关于数据这么读取我就不介绍了,略过,等进行代码调试的时候在开始研究吧。

基础网络的构建:

    _, share_net = get_network_byname(net_name=cfgs.NET_NAME,
                                      inputs=img_batch,
                                      num_classes=None,
                                      is_training=True,
                                      output_stride=None,
                                      global_pool=False,
                                      spatial_squeeze=False)

share_net 这里保存的是选择的几个需要的进行特征金字塔融合的feature map。

在network_factory.py中get_network_byname函数,可以看到。这里就不解释了。

基础特征提取模型

def get_network_byname(net_name,
                       inputs,
                       num_classes=None,
                       is_training=True,
                       global_pool=True,
                       output_stride=None,
                       spatial_squeeze=True):
  if net_name == 'resnet_v1_50':
    FLAGS = get_flags_byname(net_name)
    with slim.arg_scope(resnet_v1.resnet_arg_scope(weight_decay=FLAGS.weight_decay)):
      logits, end_points = resnet_v1.resnet_v1_50(inputs=inputs,
                                                  num_classes=num_classes,
                                                  is_training=is_training,
                                                  global_pool=global_pool,
                                                  output_stride=output_stride,
                                                  spatial_squeeze=spatial_squeeze
                                                  )

    return logits, end_points

构建rpn网络

    rpn = build_rpn.RPN(net_name=cfgs.NET_NAME,
                        inputs=img_batch,
                        gtboxes_and_label=None,
                        is_training=False,
                        share_head=cfgs.SHARE_HEAD,
                        share_net=share_net,
                        stride=cfgs.STRIDE,
                        anchor_ratios=cfgs.ANCHOR_RATIOS,
                        anchor_scales=cfgs.ANCHOR_SCALES,
                        scale_factors=cfgs.SCALE_FACTORS,
                        base_anchor_size_list=cfgs.BASE_ANCHOR_SIZE_LIST,  # P2, P3, P4, P5, P6
                        level=cfgs.LEVEL,
                        top_k_nms=cfgs.RPN_TOP_K_NMS,
                        rpn_nms_iou_threshold=cfgs.RPN_NMS_IOU_THRESHOLD,
                        max_proposals_num=cfgs.MAX_PROPOSAL_NUM,
                        rpn_iou_positive_threshold=cfgs.RPN_IOU_POSITIVE_THRESHOLD,
                        rpn_iou_negative_threshold=cfgs.RPN_IOU_NEGATIVE_THRESHOLD,
                        rpn_mini_batch_size=cfgs.RPN_MINIBATCH_SIZE,
                        rpn_positives_ratio=cfgs.RPN_POSITIVE_RATE,
                        remove_outside_anchors=False,  # whether remove anchors outside
                        rpn_weight_decay=cfgs.WEIGHT_DECAY[cfgs.NET_NAME])

RPN网络就是构建特征融合的重要模块,这里需要从最后的特征层进行反卷积。再进行特征融合。RPN对feature map生成box坐标和背景二分类。

构建fast_rcnn检测模型

fast_rcnn = build_fast_rcnn.FastRCNN(img_batch=img_batch,
                                         feature_pyramid=rpn.feature_pyramid,
                                         rpn_proposals_boxes=rpn_proposals_boxes,
                                         rpn_proposals_scores=rpn_proposals_scores,
                                         img_shape=tf.shape(img_batch),
                                         roi_size=cfgs.ROI_SIZE,
                                         scale_factors=cfgs.SCALE_FACTORS,
                                         roi_pool_kernel_size=cfgs.ROI_POOL_KERNEL_SIZE,
                                         gtboxes_and_label=None,
                                         fast_rcnn_nms_iou_threshold=cfgs.FAST_RCNN_NMS_IOU_THRESHOLD,
                                         fast_rcnn_maximum_boxes_per_img=100,
                                         fast_rcnn_nms_max_boxes_per_class=cfgs.FAST_RCNN_NMS_MAX_BOXES_PER_CLASS,
                                         show_detections_score_threshold=cfgs.FINAL_SCORE_THRESHOLD,  # show detections which score >= 0.6
                                         num_classes=cfgs.CLASS_NUM,
                                         fast_rcnn_minibatch_size=cfgs.FAST_RCNN_MINIBATCH_SIZE,
                                         fast_rcnn_positives_ratio=cfgs.FAST_RCNN_POSITIVE_RATE,
                                         fast_rcnn_positives_iou_threshold=cfgs.FAST_RCNN_IOU_POSITIVE_THRESHOLD,
                                         use_dropout=False,
                                         weight_decay=cfgs.WEIGHT_DECAY[cfgs.NET_NAME],
                                         is_training=False,
                                         level=cfgs.LEVEL)

fast_rcnn进行最后的box预测和classes分类。

fast_rcnn_decode_boxes, fast_rcnn_score, num_of_objects, detection_category = \
        fast_rcnn.fast_rcnn_predict()

参数初始化,加载预训练模型

init_op = tf.group(
        tf.global_variables_initializer(),
        tf.local_variables_initializer()
    )

    restorer, restore_ckpt = restore_model.get_restorer(checkpoint_path=args.weights)

对image进行展示

_img_batch_fpn = help_utils.draw_box_cv(_img_batch,
                                                boxes=_fast_rcnn_decode_boxes,
                                                labels=_detection_category,
                                                scores=_fast_rcnn_score)
mkdir(cfgs.TEST_SAVE_PATH)
cv2.imwrite(cfgs.TEST_SAVE_PATH +
'/{}_fpn.jpg'.format(str(_img_name_batch[0])), _img_batch_fpn)

上面粗略地过一边编test.py文件。
接下来的目的就是看看这里的几个重要部件

  • rpn = build_rpn.RPN
    rpn.rpn_proposals()
    rpn.feature_pyramid()
    rpn.rpn_losses()
  • fast_rcnn = build_fast_rcnn.FastRCNN
    fast_rcnn.fast_rcnn_predict()
    fast_rcnn.fast_rcnn_loss()

ok, 就是上面的几个主要函数。为什么是上面几个??因为在test.py和train.py只用到上面的几个函数。所以其他的方法就不需要太关注了,哈哈。。。。

强调一下这里的代码,我并没有去调试,我仅仅是为了加强对这个FPN算法的流程了解,而去看的这分代码。。。。。。

build_rpn.py

好吧,那就开始看看rpn吧

class RPN(object):
  def __init__(self, 
               net_name, # 'resnet_v1_101'
               inputs, # img_batch
               gtboxes_and_label, # to create loss
               is_training,
               share_net, #  end_point
               anchor_ratios, # [2., 3., 4., 5.]
               anchor_scales, # [2., 3., 4.]
               scale_factors, # [10., 10., 5., 5.]
               base_anchor_size_list,  # P2, P3, P4, P5, P6
               stride, # [4, 8, 16, 32, 64]
               level, # ['P2', 'P3', 'P4', 'P5', "P6"]
               top_k_nms, # 12000
               share_head=False, # True
               rpn_nms_iou_threshold=0.7,
               max_proposals_num=300,
               rpn_iou_positive_threshold=0.7,
               rpn_iou_negative_threshold=0.3,  # iou>=0.7 is positive box, iou< 0.3 is negative
               rpn_mini_batch_size=256,
               rpn_positives_ratio=0.5,
               remove_outside_anchors=False,  # whether remove anchors outside
               rpn_weight_decay=0.0001,
               ):
    
    self.net_name = net_name
    self.img_batch = inputs
    self.gtboxes_and_label = gtboxes_and_label  # shape is [M. 5],

    self.base_anchor_size_list = base_anchor_size_list

    self.anchor_ratios = tf.constant(anchor_ratios, dtype=tf.float32)
    self.anchor_scales = tf.constant(anchor_scales, dtype=tf.float32)
    self.share_head = share_head
    self.num_of_anchors_per_location = len(anchor_scales) * len(anchor_ratios) # 3 x 4 = 12

    self.scale_factors = scale_factors # [10., 10., 5., 5.] 
    self.stride = stride
    self.level = level
    self.top_k_nms = top_k_nms

    self.rpn_nms_iou_threshold = rpn_nms_iou_threshold
    self.max_proposals_num = max_proposals_num

    self.rpn_iou_positive_threshold = rpn_iou_positive_threshold
    self.rpn_iou_negative_threshold = rpn_iou_negative_threshold
    self.rpn_mini_batch_size = rpn_mini_batch_size
    self.rpn_positives_ratio = rpn_positives_ratio
    self.remove_outside_anchors = remove_outside_anchors
    self.rpn_weight_decay = rpn_weight_decay
    self.is_training = is_training

    # 由基础网络生成的所有的 feature map 
    self.share_net = share_net
    
    # 获取feature_maps
    self.feature_maps_dict = self.get_feature_maps()
    
    # 构建特征金字塔
    self.feature_pyramid = self.build_feature_pyramid()
    
    # 对所有的feature map进行box和classes(背景二分类预测)
    self.anchors, self.rpn_encode_boxes, self.rpn_scores = self.get_anchors_and_rpn_predict()

上面个就是RPN的初始化函数,在这里的前面是获取参数,在init后跟了三个函数

  • 获取feature_maps

    self.feature_maps_dict = self.get_feature_maps()
  def get_feature_maps(self):
 
    with tf.variable_scope('get_feature_maps'):
      if self.net_name == 'resnet_v1_50':
        feature_maps_dict = {
            'C2': self.share_net['resnet_v1_50/block1/unit_2/bottleneck_v1'],  # [56, 56]
            'C3': self.share_net['resnet_v1_50/block2/unit_3/bottleneck_v1'],  # [28, 28]
            'C4': self.share_net['resnet_v1_50/block3/unit_5/bottleneck_v1'],  # [14, 14]
            'C5': self.share_net['resnet_v1_50/block4']  # [7, 7]
        }
      elif self.net_name == 'resnet_v1_101':
        feature_maps_dict = {
            'C2': self.share_net['resnet_v1_101/block1/unit_2/bottleneck_v1'],  # [56, 56]
            'C3': self.share_net['resnet_v1_101/block2/unit_3/bottleneck_v1'],  # [28, 28]
            'C4': self.share_net['resnet_v1_101/block3/unit_22/bottleneck_v1'],  # [14, 14]
            'C5': self.share_net['resnet_v1_101/block4']  # [7, 7]
        }
      else:
        raise Exception('get no feature maps')

      return feature_maps_dict

这个函数浅显易懂,就是拿到基础网络的三个feature map层。进行返回

  • 构建特征金字塔build_feature_pyramid
def build_feature_pyramid(self):
    '''
    reference: https://github.com/CharlesShang/FastMaskRCNN
    build P2, P3, P4, P5
    :return: multi-scale feature map
    '''

    feature_pyramid = {}
    with tf.variable_scope('build_feature_pyramid'):
      with slim.arg_scope([slim.conv2d], weights_regularizer=slim.l2_regularizer(self.rpn_weight_decay)):
        feature_pyramid['P5'] = slim.conv2d(self.feature_maps_dict['C5'],
                                            num_outputs=256,
                                            kernel_size=[1, 1],
                                            stride=1,
                                            scope='build_P5')

        feature_pyramid['P6'] = slim.max_pool2d(feature_pyramid['P5'],
                                                kernel_size=[2, 2], stride=2, scope='build_P6')
        # P6 is down sample of P5
        # 
        for layer in range(4, 1, -1):
          p, c = feature_pyramid['P' + str(layer + 1)], self.feature_maps_dict['C' + str(layer)]
          up_sample_shape = tf.shape(c)
          
          # 进行图片的resize,这是插值吗???
          up_sample = tf.image.resize_nearest_neighbor(p, [up_sample_shape[1], up_sample_shape[2]],
                                                       name='build_P%d/up_sample_nearest_neighbor' % layer)
          
          c = slim.conv2d(c, num_outputs=256, kernel_size=[1, 1], stride=1,
                          scope='build_P%d/reduce_dimension' % layer)
          # 直接相加?? 
          p = up_sample + c
          
          # 进行卷积的编码
          p = slim.conv2d(p, 256, kernel_size=[3, 3], stride=1,
                          padding='SAME', scope='build_P%d/avoid_aliasing' % layer)
          feature_pyramid['P' + str(layer)] = p

    return feature_pyramid

这里的流程比较简单,会产生P2, P3, P4, P5,P6这几个层.

为了防止忘记,记录一下流程:

  • P6是P5的池化层
  • P5是C5进行卷积产生的
  • P4 是 P5 +C4合成的
  • P3 是P4 +C3合成
  • P2 是P3 +C2合成

接下就是进行box坐标和classes(二分类)的预测了。

  • get_anchors_and_rpn_predict
def get_anchors_and_rpn_predict(self):
    # 对所有的feature map产生固定box
    anchors = self.make_anchors()
    
    # 对所vfeature map预测box和背景(二分类)
    rpn_encode_boxes, rpn_scores = self.rpn_net()

    with tf.name_scope('get_anchors_and_rpn_predict'):
      if self.is_training:
        if self.remove_outside_anchors:
          # 过滤box进行过滤,删除超出image边界的box,产生index
          valid_indices = boxes_utils.filter_outside_boxes(boxes=anchors,
                                                           img_h=tf.shape(self.img_batch)[1],
                                                           img_w=tf.shape(self.img_batch)[2])
          # 直接进行过滤
          valid_anchors = tf.gather(anchors, valid_indices)
          rpn_valid_encode_boxes = tf.gather(rpn_encode_boxes, valid_indices)
          rpn_valid_scores = tf.gather(rpn_scores, valid_indices)

          return valid_anchors, rpn_valid_encode_boxes, rpn_valid_scores
        else:
          return anchors, rpn_encode_boxes, rpn_scores
      else:
        return anchors, rpn_encode_boxes, rpn_scores

从上面的可以看出,这里就是进行最后的RPN的预测的一个函数。

这里有三个函数

  • anchors = self.make_anchors()
  • rpn_encode_boxes, rpn_scores = self.rpn_net()
  • boxes_utils.filter_outside_boxes

make_anchors

def make_anchors(self):
    with tf.variable_scope('make_anchors'):
      anchor_list = []
      level_list = self.level
      with tf.name_scope('make_anchors_all_level'):
        for level, base_anchor_size, stride in zip(level_list, self.base_anchor_size_list, self.stride):
          '''
          (level, base_anchor_size) tuple:
          (P2, 32), (P3, 64), (P4, 128), (P5, 256), (P6, 512)
          '''
          
          featuremap_height, featuremap_width = tf.shape(self.feature_pyramid[level])[1], \
              tf.shape(self.feature_pyramid[level])[2]
          # stride = base_anchor_size / 8.

          # tmp_anchors = tf.py_func(
          #     anchor_utils_pyfunc.make_anchors,
          #     inp=[base_anchor_size, self.anchor_scales, self.anchor_ratios,
          #          featuremap_height, featuremap_width, stride],
          #     Tout=tf.float32
          # )
          
          # 这里只需需要输入 feature map的size和窗口的尺寸就可以对没个feature map生成固定的box
          tmp_anchors = make_anchor.make_anchors(base_anchor_size, self.anchor_scales, self.anchor_ratios,
                                                 featuremap_height, featuremap_width, stride,
                                                 name='make_anchors_{}'.format(level))
          tmp_anchors = tf.reshape(tmp_anchors, [-1, 4])
          anchor_list.append(tmp_anchors)

        all_level_anchors = tf.concat(anchor_list, axis=0)
      # 返回所有的box的坐标
      return all_level_anchors

这里的make_anchor其实是在输入feature map尺寸之后就会生成固定的box。怎么可以在我以前的文章中看看点这里,就不进行解释了。

  • self.rpn_net()
def rpn_net(self):

    rpn_encode_boxes_list = []
    rpn_scores_list = []
    with tf.variable_scope('rpn_net'):
      with slim.arg_scope([slim.conv2d], weights_regularizer=slim.l2_regularizer(self.rpn_weight_decay)):
        for level in self.level:

          if self.share_head:
            reuse_flag = None if level == 'P2' else True
            scope_list = ['conv2d_3x3', 'rpn_classifier', 'rpn_regressor']
            # in the begining, we should create variables, then sharing variables in P3, P4, P5
            # 这里提到这个的rpn_net对所有的feature map进行box预测是可以进行网络共享的
          else:
            reuse_flag = None
            scope_list = ['conv2d_3x3_'+level, 'rpn_classifier_'+level, 'rpn_regressor_'+level]
          # 先进行一次卷积操作
          rpn_conv2d_3x3 = slim.conv2d(inputs=self.feature_pyramid[level],
                                       num_outputs=512,
                                       kernel_size=[3, 3],
                                       stride=1,
                                       scope=scope_list[0],
                                       reuse=reuse_flag)
          # 得到是否是背景的二分类scoress
          rpn_box_scores = slim.conv2d(rpn_conv2d_3x3,
                                       num_outputs=2 * self.num_of_anchors_per_location,
                                       kernel_size=[1, 1],
                                       stride=1,
                                       scope=scope_list[1],
                                       activation_fn=None,
                                       reuse=reuse_flag)
          # 得到boxs的坐标
          rpn_encode_boxes = slim.conv2d(rpn_conv2d_3x3,
                                         num_outputs=4 * self.num_of_anchors_per_location,
                                         kernel_size=[1, 1],
                                         stride=1,
                                         scope=scope_list[2],
                                         activation_fn=None,
                                         reuse=reuse_flag)

          rpn_box_scores = tf.reshape(rpn_box_scores, [-1, 2])
          rpn_encode_boxes = tf.reshape(rpn_encode_boxes, [-1, 4])

          rpn_scores_list.append(rpn_box_scores)
          rpn_encode_boxes_list.append(rpn_encode_boxes)

        rpn_all_encode_boxes = tf.concat(rpn_encode_boxes_list, axis=0)
        rpn_all_boxes_scores = tf.concat(rpn_scores_list, axis=0)

      return rpn_all_encode_boxes, rpn_all_boxes_scores

这里就比较简单了,在这里生成rpn_all_encode_boxes :生成box的坐标
rpn_all_boxes_scores :生成box的坐标

又遗忘了一个重要的部分

每次都会遗忘RPN部分中的固定box的坐标和RPN网路预测的box坐标有什么关系。看这里
bbox_transform_inv函数结合RPN的输出对所有初始框进行了坐标变换

也是接下来一个重要的部分

* 就是rpn.rpn_proposals()

def rpn_proposals(self):
    with tf.variable_scope('rpn_proposals'):
      
      # 对RPB预测的box坐标,结果anchors进行编码,这里进行解码
      rpn_decode_boxes = encode_and_decode.decode_boxes(encode_boxes=self.rpn_encode_boxes,
                                                        reference_boxes=self.anchors,
                                                        scale_factors=self.scale_factors)

      if not self.is_training:  # when test, clip proposals to img boundaries
        img_shape = tf.shape(self.img_batch)
        
        # 对预测的坐标进行裁减
        rpn_decode_boxes = boxes_utils.clip_boxes_to_img_boundaries(rpn_decode_boxes, img_shape)

      rpn_softmax_scores = slim.softmax(self.rpn_scores)
      rpn_object_score = rpn_softmax_scores[:, 1]  # second column represent object
      
      # 提取前top_k box
      if self.top_k_nms:
        rpn_object_score, top_k_indices = tf.nn.top_k(rpn_object_score, k=self.top_k_nms)
        rpn_decode_boxes = tf.gather(rpn_decode_boxes, top_k_indices)
      
      # 进行nms去重
      valid_indices = nms.non_maximal_suppression(boxes=rpn_decode_boxes,
                                                  scores=rpn_object_score,
                                                  max_output_size=self.max_proposals_num,
                                                  iou_threshold=self.rpn_nms_iou_threshold)

      valid_boxes = tf.gather(rpn_decode_boxes, valid_indices)
      valid_scores = tf.gather(rpn_object_score, valid_indices)
      rpn_proposals_boxes, rpn_proposals_scores = tf.cond(
          tf.less(tf.shape(valid_boxes)[0], self.max_proposals_num),
          lambda: boxes_utils.padd_boxes_with_zeros(valid_boxes, valid_scores,
                                                    self.max_proposals_num),
          lambda: (valid_boxes, valid_scores))

上面进行了box的解码,去重,裁剪,提取。

贴一张不知道是否符合的编码公式(faster r-cnn)

FPN源码_第3张图片
  • 构建 rpn_losses

并未仔细研究,放在这里占一个坑。

def rpn_losses(self):
    with tf.variable_scope('rpn_losses'):
      minibatch_indices, minibatch_anchor_matched_gtboxes, object_mask, minibatch_labels_one_hot = \
          self.make_minibatch(self.anchors)

      minibatch_anchors = tf.gather(self.anchors, minibatch_indices)
      minibatch_encode_boxes = tf.gather(self.rpn_encode_boxes, minibatch_indices)
      minibatch_boxes_scores = tf.gather(self.rpn_scores, minibatch_indices)

      # encode gtboxes
      minibatch_encode_gtboxes = encode_and_decode.encode_boxes(unencode_boxes=minibatch_anchor_matched_gtboxes,
                                                                reference_boxes=minibatch_anchors,
                                                                scale_factors=self.scale_factors)

      positive_anchors_in_img = draw_box_with_color(self.img_batch,
                                                    minibatch_anchors *
                                                    tf.expand_dims(object_mask, 1),
                                                    text=tf.shape(tf.where(tf.equal(object_mask, 1.0)))[0])

      negative_mask = tf.cast(tf.logical_not(tf.cast(object_mask, tf.bool)), tf.float32)
      negative_anchors_in_img = draw_box_with_color(self.img_batch,
                                                    minibatch_anchors *
                                                    tf.expand_dims(negative_mask, 1),
                                                    text=tf.shape(tf.where(tf.equal(object_mask, 0.0)))[0])

      minibatch_decode_boxes = encode_and_decode.decode_boxes(encode_boxes=minibatch_encode_boxes,
                                                              reference_boxes=minibatch_anchors,
                                                              scale_factors=self.scale_factors)

      tf.summary.image('/positive_anchors', positive_anchors_in_img)
      tf.summary.image('/negative_anchors', negative_anchors_in_img)
      top_k_scores, top_k_indices = tf.nn.top_k(minibatch_boxes_scores[:, 1], k=1)

      top_detections_in_img = draw_box_with_color(self.img_batch,
                                                  tf.gather(minibatch_decode_boxes, top_k_indices),
                                                  text=tf.shape(top_k_scores)[0])
      tf.summary.image('/top_1', top_detections_in_img)

      # losses
      with tf.variable_scope('rpn_location_loss'):
        location_loss = losses.l1_smooth_losses(predict_boxes=minibatch_encode_boxes,
                                                gtboxes=minibatch_encode_gtboxes,
                                                object_weights=object_mask)
        tf.losses.add_loss(location_loss)  # add smooth l1 loss to losses collection

      with tf.variable_scope('rpn_classification_loss'):
        classification_loss = tf.losses.softmax_cross_entropy(logits=minibatch_boxes_scores,
                                                              onehot_labels=minibatch_labels_one_hot)

      return location_loss, classification_loss

接下来就是构建最后的目标检测

build_fast_rcnn.py

# build_fast_rcnn.py
class FastRCNN(object):
  def __init__(self,
               feature_pyramid, # rpn.feature_pyramid
               rpn_proposals_boxes,  #rpn_proposals_boxes
               rpn_proposals_scores, # rpn_proposals_scores
               img_batch, # imag
               img_shape,# tf.shape(img_batch)
               roi_size, # 14
               scale_factors, # SCALE_FACTORS = [10., 10., 5., 5.]
               roi_pool_kernel_size,  # roi size = initial_crop_size / max_pool_kernel size
               gtboxes_and_label,  # [M, 5] 计算 loss
               fast_rcnn_nms_iou_threshold, # 
               fast_rcnn_maximum_boxes_per_img, # 100
               fast_rcnn_nms_max_boxes_per_class, # 100
               show_detections_score_threshold,  # show box scores larger than this threshold

               num_classes,  # exclude background
               fast_rcnn_minibatch_size,# 256
               fast_rcnn_positives_ratio, #0.5
               fast_rcnn_positives_iou_threshold,#0.2
               use_dropout,
               is_training,
               weight_decay,
               level): #['P2', 'P3', 'P4', 'P5', "P6"]

    self.feature_pyramid = feature_pyramid
    self.rpn_proposals_boxes = rpn_proposals_boxes  # [N, 4]
    self.rpn_proposals_scores = rpn_proposals_scores

    self.img_shape = img_shape
    self.img_batch = img_batch
    self.roi_size = roi_size
    self.roi_pool_kernel_size = roi_pool_kernel_size
    self.level = level
    self.min_level = int(level[0][1])
    self.max_level = min(int(level[-1][1]), 5)

    self.fast_rcnn_nms_iou_threshold = fast_rcnn_nms_iou_threshold
    self.fast_rcnn_nms_max_boxes_per_class = fast_rcnn_nms_max_boxes_per_class
    self.fast_rcnn_maximum_boxes_per_img = fast_rcnn_maximum_boxes_per_img
    self.show_detections_score_threshold = show_detections_score_threshold

    self.scale_factors = scale_factors
    # larger than 0.5 is positive, others are negative
    self.fast_rcnn_positives_iou_threshold = fast_rcnn_positives_iou_threshold

    self.fast_rcnn_minibatch_size = fast_rcnn_minibatch_size
    self.fast_rcnn_positives_ratio = fast_rcnn_positives_ratio

    self.gtboxes_and_label = gtboxes_and_label
    self.num_classes = num_classes
    self.use_dropout = use_dropout
    self.is_training = is_training
    self.weight_decay = weight_decay
    
    # 生成rois层
    self.fast_rcnn_all_level_rois, self.fast_rcnn_all_level_proposals = self.get_rois()
    # 对rois层进行卷积操作,的到最后的box坐标和classes分类
    self.fast_rcnn_encode_boxes, self.fast_rcnn_scores = self.fast_rcnn_net()

上面就是进行FPN检测的模块init函数,最后两个函数就把所有的检测结果算出来了

就是这两个函数

# 生成rois层
    self.fast_rcnn_all_level_rois, self.fast_rcnn_all_level_proposals = self.get_rois()
    # 对rois层进行卷积操作,的到最后的box坐标和classes分类
    self.fast_rcnn_encode_boxes, self.fast_rcnn_scores = self.fast_rcnn_net()
  • self.get_rois()
def get_rois(self):
    '''
    1)get roi from feature map
    2)roi align or roi pooling. Here is roi align
    :return:
    all_level_rois: [N, 7, 7, C]
    all_level_proposals : [N, 4]
    all_level_proposals is matched with all_level_rois

    '''
    levels = self.assign_level()

    all_level_roi_list = []
    all_level_proposal_list = []
    if DEBUG:
      print_tensors(levels, 'levels')
    with tf.variable_scope('fast_rcnn_roi'):
      # P6 is not used by the Fast R-CNN detector.
      for i in range(self.min_level, self.max_level + 1):
        level_i_proposal_indices = tf.reshape(tf.where(tf.equal(levels, i)), [-1])
        level_i_proposals = tf.gather(self.rpn_proposals_boxes, level_i_proposal_indices)

        level_i_proposals = tf.cond(
            tf.equal(tf.shape(level_i_proposals)[0], 0),
            lambda: tf.constant([[0, 0, 0, 0]], dtype=tf.float32),
            lambda: level_i_proposals
        )  # to avoid level_i_proposals batch is 0, or it will broken when gradient BP

        all_level_proposal_list.append(level_i_proposals)

        ymin, xmin, ymax, xmax = tf.unstack(level_i_proposals, axis=1)
        img_h, img_w = tf.cast(self.img_shape[1], tf.float32), tf.cast(
            self.img_shape[2], tf.float32)
        normalize_ymin = ymin / img_h
        normalize_xmin = xmin / img_w
        normalize_ymax = ymax / img_h
        normalize_xmax = xmax / img_w
        
        # 从feature map中把box对应的部分剪裁出来
        level_i_cropped_rois = tf.image.crop_and_resize(self.feature_pyramid['P%d' % i],
                                                        boxes=tf.transpose(tf.stack([normalize_ymin, normalize_xmin,
                                                                                     normalize_ymax, normalize_xmax])),
                                                        box_ind=tf.zeros(shape=[tf.shape(level_i_proposals)[0], ],
                                                                         dtype=tf.int32),
                                                        crop_size=[self.roi_size, self.roi_size]
                                                        )
        # 对这里的roi进行卷积操作
        level_i_rois = slim.max_pool2d(level_i_cropped_rois,
                                       [self.roi_pool_kernel_size, self.roi_pool_kernel_size],
                                       stride=self.roi_pool_kernel_size)
        all_level_roi_list.append(level_i_rois)

      all_level_rois = tf.concat(all_level_roi_list, axis=0)
      all_level_proposals = tf.concat(all_level_proposal_list, axis=0)
      
      # all_level_rois:这里的rois是最后的rois层
      # all_level_proposals:这里的proposals就是box的坐标
      return all_level_rois, all_level_proposals

上面的rois其实就是从feature map裁剪出来的,这里感觉没什么好讲的,这里出现了一个不同点。如下

论文里出现了这个:


def assign_level(self):
    with tf.name_scope('assign_levels'):
      ymin, xmin, ymax, xmax = tf.unstack(self.rpn_proposals_boxes, axis=1)

      w = tf.maximum(xmax - xmin, 0.)  # avoid w is negative
      h = tf.maximum(ymax - ymin, 0.)  # avoid h is negative

      levels = tf.round(4. + tf.log(tf.sqrt(w*h + 1e-8)/224.0) / tf.log(2.))  # 4 + log_2(***)

      levels = tf.maximum(levels, tf.ones_like(levels) *
                          (np.float32(self.min_level)))  # level minimum is 2
      levels = tf.minimum(levels, tf.ones_like(levels) *
                          (np.float32(self.max_level)))  # level maximum is 5

      return tf.cast(levels, tf.int32)

接下来就是进行box的预测了。

* fast_rcnn_net

def fast_rcnn_net(self):

    with tf.variable_scope('fast_rcnn_net'):
      with slim.arg_scope([slim.fully_connected], weights_regularizer=slim.l2_regularizer(self.weight_decay)):

        flatten_rois_features = slim.flatten(self.fast_rcnn_all_level_rois)

        net = slim.fully_connected(flatten_rois_features, 1024, scope='fc_1')
        if self.use_dropout:
          net = slim.dropout(net, keep_prob=0.5, is_training=self.is_training, scope='dropout')

        net = slim.fully_connected(net, 1024, scope='fc_2')
        
        # 对每个rois的到self.num_classes + 1个类
        fast_rcnn_scores = slim.fully_connected(net, self.num_classes + 1, activation_fn=None,
                                                scope='classifier')
        
        # 对每个rois得到self.num_classes * 4的坐标
        fast_rcnn_encode_boxes = slim.fully_connected(net, self.num_classes * 4, activation_fn=None,
                                                      scope='regressor')
      if DEBUG:
        print_tensors(fast_rcnn_encode_boxes, 'fast_rcnn_encode_bxes')

      return fast_rcnn_encode_boxes, fast_rcnn_scores

这里没使用for 循环,而是使用slim的slim.fully_connected对rois进行预测。应该是对每个rois进行单独预测。

fast_rcnn_predict

def fast_rcnn_predict(self):

    with tf.variable_scope('fast_rcnn_predict'):
      fast_rcnn_softmax_scores = slim.softmax(self.fast_rcnn_scores)  # [-1, num_classes+1]

      fast_rcnn_encode_boxes = tf.reshape(self.fast_rcnn_encode_boxes, [-1, 4])

      reference_boxes = tf.tile(self.fast_rcnn_all_level_proposals, [
                                1, self.num_classes])  # [N, 4*num_classes]
      reference_boxes = tf.reshape(reference_boxes, [-1, 4])   # [N*num_classes, 4]
      fast_rcnn_decode_boxes = encode_and_decode.decode_boxes(encode_boxes=fast_rcnn_encode_boxes,
                                                              reference_boxes=reference_boxes,
                                                              scale_factors=self.scale_factors)

      fast_rcnn_decode_boxes = boxes_utils.clip_boxes_to_img_boundaries(fast_rcnn_decode_boxes,
                                                                        img_shape=self.img_shape)

      # mutilclass NMS
      fast_rcnn_decode_boxes = tf.reshape(fast_rcnn_decode_boxes, [-1, self.num_classes*4])
      fast_rcnn_decode_boxes, fast_rcnn_score, num_of_objects, detection_category = \
          self.fast_rcnn_proposals(fast_rcnn_decode_boxes, scores=fast_rcnn_softmax_scores)

      return fast_rcnn_decode_boxes, fast_rcnn_score, num_of_objects, detection_category

这里fast_rcnn_predict其实就是对fast_rcnn_proposals产生的结果进行softmax操作,并把box进行解码,同时把超出图片边界的box进行剪裁

  • fast_rcnn loss
def fast_rcnn_loss(self):
    with tf.variable_scope('fast_rcnn_loss'):
      minibatch_indices, minibatch_reference_boxes_mattached_gtboxes, minibatch_object_mask, \
          minibatch_label_one_hot = self.fast_rcnn_minibatch(self.fast_rcnn_all_level_proposals)

      minibatch_reference_boxes = tf.gather(self.fast_rcnn_all_level_proposals, minibatch_indices)

      minibatch_encode_boxes = tf.gather(self.fast_rcnn_encode_boxes,
                                         minibatch_indices)  # [minibatch_size, num_classes*4]
      minibatch_scores = tf.gather(self.fast_rcnn_scores, minibatch_indices)

      positive_proposals_in_img = draw_box_with_color(self.img_batch,
                                                      minibatch_reference_boxes * tf.expand_dims(
                                                          minibatch_object_mask, 1),
                                                      text=tf.shape(tf.where(tf.equal(minibatch_object_mask, 1.0)))[0])

      negative_mask = tf.cast(tf.logical_not(tf.cast(minibatch_object_mask, tf.bool)), tf.float32)
      negative_proposals_in_img = draw_box_with_color(self.img_batch,
                                                      minibatch_reference_boxes * tf.expand_dims(negative_mask,
                                                                                                 1),
                                                      text=tf.shape(tf.where(tf.equal(minibatch_object_mask, 0.0)))[0])

      tf.summary.image('/positive_proposals', positive_proposals_in_img)
      tf.summary.image('/negative_proposals', negative_proposals_in_img)

      if cfgs.CLASS_NUM == 1:
        minibatch_decode_boxes = encode_and_decode.decode_boxes(encode_boxes=minibatch_encode_boxes,
                                                                reference_boxes=minibatch_reference_boxes,
                                                                scale_factors=self.scale_factors)

        minibatch_softmax_scores = tf.gather(slim.softmax(self.fast_rcnn_scores), minibatch_indices)
        top_k_scores, top_k_indices = tf.nn.top_k(minibatch_softmax_scores[:, 1], k=5)

        top_detections_in_img = draw_boxes_with_scores(self.img_batch,
                                                       boxes=tf.gather(
                                                           minibatch_decode_boxes, top_k_indices),
                                                       scores=top_k_scores)
        tf.summary.image('/top_5', top_detections_in_img)

      # encode gtboxes
      minibatch_encode_gtboxes = \
          encode_and_decode.encode_boxes(
              unencode_boxes=minibatch_reference_boxes_mattached_gtboxes,
              reference_boxes=minibatch_reference_boxes,
              scale_factors=self.scale_factors
          )

      # [minibatch_size, num_classes*4]
      minibatch_encode_gtboxes = tf.tile(minibatch_encode_gtboxes, [1, self.num_classes])

      class_weights_list = []
      category_list = tf.unstack(minibatch_label_one_hot, axis=1)
      for i in range(1, self.num_classes+1):
        tmp_class_weights = tf.ones(
            shape=[tf.shape(minibatch_encode_boxes)[0], 4], dtype=tf.float32)
        tmp_class_weights = tmp_class_weights * tf.expand_dims(category_list[i], axis=1)
        class_weights_list.append(tmp_class_weights)
      class_weights = tf.concat(class_weights_list, axis=1)  # [minibatch_size, num_classes*4]

      # loss
      with tf.variable_scope('fast_rcnn_classification_loss'):
        fast_rcnn_classification_loss = tf.losses.softmax_cross_entropy(logits=minibatch_scores,
                                                                        onehot_labels=minibatch_label_one_hot)

      with tf.variable_scope('fast_rcnn_location_loss'):
        fast_rcnn_location_loss = losses.l1_smooth_losses(predict_boxes=minibatch_encode_boxes,
                                                          gtboxes=minibatch_encode_gtboxes,
                                                          object_weights=minibatch_object_mask,
                                                          classes_weights=class_weights)
        tf.losses.add_loss(fast_rcnn_location_loss)

      return fast_rcnn_location_loss, fast_rcnn_classification_loss

对坐标和分类进行loss计算。

发现这里的trian.py文件写的比较清晰

trian.py

这里就是文件的主要结构

def train():
  with tf.Graph().as_default():
    with tf.name_scope('get_batch'):
    # 读取数据
      img_name_batch, img_batch, gtboxes_and_label_batch, num_objects_batch = next_batch()

    
    # ***********************************************************************************************
    # *  share net 构建基础网络                                          *
    # ***********************************************************************************************
    _, share_net = get_network_byname()

    # ***********************************************************************************************
    # *   rpn      构建RPN网络                                        *
    # ***********************************************************************************************
    rpn = build_rpn.RPN()

 
    # ***********************************************************************************************
    # *  Fast RCNN  构建检测模型                                         *
    # ***********************************************************************************************

    fast_rcnn = build_fast_rcnn.FastRCNN()

    fast_rcnn_decode_boxes, fast_rcnn_score, num_of_objects, detection_category = \
        fast_rcnn.fast_rcnn_predict()
    # ***********************************************************************************************
    # * loss  计算RPN和fast_rcnn的loss                                      *
    # ***********************************************************************************************

    fast_rcnn_location_loss, fast_rcnn_classification_loss = fast_rcnn.fast_rcnn_loss()
    fast_rcnn_total_loss = fast_rcnn_location_loss + fast_rcnn_classification_loss


    # train
    added_loss = rpn_total_loss + fast_rcnn_total_loss
    total_loss = tf.losses.get_total_loss()

    global_step = tf.train.get_or_create_global_step()

    lr = tf.train.piecewise_constant(global_step,
                                     boundaries=[np.int64(20000), np.int64(40000)],
                                     values=[cfgs.LR, cfgs.LR / 10, cfgs.LR / 100])
    tf.summary.scalar('lr', lr)
    optimizer = tf.train.MomentumOptimizer(lr, momentum=cfgs.MOMENTUM)

    train_op = slim.learning.create_train_op(total_loss, optimizer, global_step)  # rpn_total_loss,
    # train_op = optimizer.minimize(second_classification_loss, global_step)
  

这就是FPN的大体结构,这里还有坑需要补。。。

更新 已经跑过代码

代码填坑
参考:
使用TensorFlow训练综合FPN和Faster-RCNN的目标检测模型
目标检测算法之——FPN(Feature Pyramid Networks

你可能感兴趣的:(FPN源码)