看FPN论文之后感觉没什么思路,感觉就需要看看源码了,没调试这里的代码,只是为了学习算法的流程。
github:月夜的代码
这博客的作者提到三份代码:
使用TensorFlow训练综合FPN和Faster-RCNN的目标检测模型
FPN模型简介
- 目前几乎所有在 ImageNet 和 COCO 检测任务上取得好成绩的方法都使用了图像金字塔方法,如图 (a) 所示。然而这样的方法由于很高的时间及计算量消耗,难以在实际中应用。
- 两阶段检测如Faster-rcnn系列,使用的是最后一个特征如图(b)所示。而对于卷积神经网络而言,不同深度对应着不同层次的语义特征,浅层网络分辨率高,学的更多是细节特征,深层网络分辨率低,学的更多是语义特征。结合多种层次的特征图是现在的一个趋势。
- 特征分层,即每层分别预测对应的 scale 分辨率的检测结果,如图 (c) 所示。SSD 检测框架采用了类似的思想。这样的方法问题在于直接强行让不同层学习同样的语义信息。
- 图中1(d) 所示,网络直接在原来的单网络上做修改,每个分辨率的 feature map 引入后一分辨率缩放两倍的 feature map 做 element-wise 相加的操作。通过这样的连接,每一层预测所用的 feature map 都融合了不同分辨率、不同语义强度的特征,融合的不同分辨率的 feature map 分别做对应分辨率大小的物体检测, 这就是FPN网络,就是今天所说的模型。
但我读的是作者改编之后的代码:
这里有几个文件需要做一下备忘录:
- configs:下面是一些模型配置的超参数,这里有vgg,inception等。
- data: 使用来做数据的工厂文件,这里的文件与数据生成有关。
- help_utils:有两个文件,help_utils.py是show图片的一个重要文件。
- scripts: 脚本文件,在Ubuntu下直接执行的.sh文件,调用tools文件进行train,test,eval,inferen。
- tools:目标检测的几个阶段的主函数文件,训练,测试,评估,推断。接下来的讲解路线就是从这里开始。
- gen_classes.py: 作者提到这是用来生成label的文件,可以得到txt,txt是制作标签时的第一步嘛。
- cnvert_txt2xml.py: 从名字就可以看出来是把txt文件转化为xml的标注格式。
test.py
这个文件大体看起来分为几个部分。
- 1.参数传入,数据读取
- 2.构建基础网络,vgg,inception,提供每层的feature map。FPN不仅仅使用最后一层,中间的几层也会使用的。
-
- 构建RPN网络,对每层的feature map进行生成box和背景二分类
- 4.构建Faster RCNN网络,为什么不叫FPN网络,不知道,但是这里确实构建了特征金字塔。
- 5.获取需要的数值占位符,并进行计算。
- 6.使用help_utils进行结果展示
关于数据这么读取我就不介绍了,略过,等进行代码调试的时候在开始研究吧。
基础网络的构建:
_, share_net = get_network_byname(net_name=cfgs.NET_NAME,
inputs=img_batch,
num_classes=None,
is_training=True,
output_stride=None,
global_pool=False,
spatial_squeeze=False)
share_net 这里保存的是选择的几个需要的进行特征金字塔融合的feature map。
在network_factory.py中get_network_byname函数,可以看到。这里就不解释了。
基础特征提取模型
def get_network_byname(net_name,
inputs,
num_classes=None,
is_training=True,
global_pool=True,
output_stride=None,
spatial_squeeze=True):
if net_name == 'resnet_v1_50':
FLAGS = get_flags_byname(net_name)
with slim.arg_scope(resnet_v1.resnet_arg_scope(weight_decay=FLAGS.weight_decay)):
logits, end_points = resnet_v1.resnet_v1_50(inputs=inputs,
num_classes=num_classes,
is_training=is_training,
global_pool=global_pool,
output_stride=output_stride,
spatial_squeeze=spatial_squeeze
)
return logits, end_points
构建rpn网络
rpn = build_rpn.RPN(net_name=cfgs.NET_NAME,
inputs=img_batch,
gtboxes_and_label=None,
is_training=False,
share_head=cfgs.SHARE_HEAD,
share_net=share_net,
stride=cfgs.STRIDE,
anchor_ratios=cfgs.ANCHOR_RATIOS,
anchor_scales=cfgs.ANCHOR_SCALES,
scale_factors=cfgs.SCALE_FACTORS,
base_anchor_size_list=cfgs.BASE_ANCHOR_SIZE_LIST, # P2, P3, P4, P5, P6
level=cfgs.LEVEL,
top_k_nms=cfgs.RPN_TOP_K_NMS,
rpn_nms_iou_threshold=cfgs.RPN_NMS_IOU_THRESHOLD,
max_proposals_num=cfgs.MAX_PROPOSAL_NUM,
rpn_iou_positive_threshold=cfgs.RPN_IOU_POSITIVE_THRESHOLD,
rpn_iou_negative_threshold=cfgs.RPN_IOU_NEGATIVE_THRESHOLD,
rpn_mini_batch_size=cfgs.RPN_MINIBATCH_SIZE,
rpn_positives_ratio=cfgs.RPN_POSITIVE_RATE,
remove_outside_anchors=False, # whether remove anchors outside
rpn_weight_decay=cfgs.WEIGHT_DECAY[cfgs.NET_NAME])
RPN网络就是构建特征融合的重要模块,这里需要从最后的特征层进行反卷积。再进行特征融合。RPN对feature map生成box坐标和背景二分类。
构建fast_rcnn检测模型
fast_rcnn = build_fast_rcnn.FastRCNN(img_batch=img_batch,
feature_pyramid=rpn.feature_pyramid,
rpn_proposals_boxes=rpn_proposals_boxes,
rpn_proposals_scores=rpn_proposals_scores,
img_shape=tf.shape(img_batch),
roi_size=cfgs.ROI_SIZE,
scale_factors=cfgs.SCALE_FACTORS,
roi_pool_kernel_size=cfgs.ROI_POOL_KERNEL_SIZE,
gtboxes_and_label=None,
fast_rcnn_nms_iou_threshold=cfgs.FAST_RCNN_NMS_IOU_THRESHOLD,
fast_rcnn_maximum_boxes_per_img=100,
fast_rcnn_nms_max_boxes_per_class=cfgs.FAST_RCNN_NMS_MAX_BOXES_PER_CLASS,
show_detections_score_threshold=cfgs.FINAL_SCORE_THRESHOLD, # show detections which score >= 0.6
num_classes=cfgs.CLASS_NUM,
fast_rcnn_minibatch_size=cfgs.FAST_RCNN_MINIBATCH_SIZE,
fast_rcnn_positives_ratio=cfgs.FAST_RCNN_POSITIVE_RATE,
fast_rcnn_positives_iou_threshold=cfgs.FAST_RCNN_IOU_POSITIVE_THRESHOLD,
use_dropout=False,
weight_decay=cfgs.WEIGHT_DECAY[cfgs.NET_NAME],
is_training=False,
level=cfgs.LEVEL)
fast_rcnn进行最后的box预测和classes分类。
fast_rcnn_decode_boxes, fast_rcnn_score, num_of_objects, detection_category = \
fast_rcnn.fast_rcnn_predict()
参数初始化,加载预训练模型
init_op = tf.group(
tf.global_variables_initializer(),
tf.local_variables_initializer()
)
restorer, restore_ckpt = restore_model.get_restorer(checkpoint_path=args.weights)
对image进行展示
_img_batch_fpn = help_utils.draw_box_cv(_img_batch,
boxes=_fast_rcnn_decode_boxes,
labels=_detection_category,
scores=_fast_rcnn_score)
mkdir(cfgs.TEST_SAVE_PATH)
cv2.imwrite(cfgs.TEST_SAVE_PATH +
'/{}_fpn.jpg'.format(str(_img_name_batch[0])), _img_batch_fpn)
上面粗略地过一边编test.py文件。
接下来的目的就是看看这里的几个重要部件
- rpn = build_rpn.RPN
rpn.rpn_proposals()
rpn.feature_pyramid()
rpn.rpn_losses() - fast_rcnn = build_fast_rcnn.FastRCNN
fast_rcnn.fast_rcnn_predict()
fast_rcnn.fast_rcnn_loss()
ok, 就是上面的几个主要函数。为什么是上面几个??因为在test.py和train.py只用到上面的几个函数。所以其他的方法就不需要太关注了,哈哈。。。。
强调一下这里的代码,我并没有去调试,我仅仅是为了加强对这个FPN算法的流程了解,而去看的这分代码。。。。。。
build_rpn.py
好吧,那就开始看看rpn吧
class RPN(object):
def __init__(self,
net_name, # 'resnet_v1_101'
inputs, # img_batch
gtboxes_and_label, # to create loss
is_training,
share_net, # end_point
anchor_ratios, # [2., 3., 4., 5.]
anchor_scales, # [2., 3., 4.]
scale_factors, # [10., 10., 5., 5.]
base_anchor_size_list, # P2, P3, P4, P5, P6
stride, # [4, 8, 16, 32, 64]
level, # ['P2', 'P3', 'P4', 'P5', "P6"]
top_k_nms, # 12000
share_head=False, # True
rpn_nms_iou_threshold=0.7,
max_proposals_num=300,
rpn_iou_positive_threshold=0.7,
rpn_iou_negative_threshold=0.3, # iou>=0.7 is positive box, iou< 0.3 is negative
rpn_mini_batch_size=256,
rpn_positives_ratio=0.5,
remove_outside_anchors=False, # whether remove anchors outside
rpn_weight_decay=0.0001,
):
self.net_name = net_name
self.img_batch = inputs
self.gtboxes_and_label = gtboxes_and_label # shape is [M. 5],
self.base_anchor_size_list = base_anchor_size_list
self.anchor_ratios = tf.constant(anchor_ratios, dtype=tf.float32)
self.anchor_scales = tf.constant(anchor_scales, dtype=tf.float32)
self.share_head = share_head
self.num_of_anchors_per_location = len(anchor_scales) * len(anchor_ratios) # 3 x 4 = 12
self.scale_factors = scale_factors # [10., 10., 5., 5.]
self.stride = stride
self.level = level
self.top_k_nms = top_k_nms
self.rpn_nms_iou_threshold = rpn_nms_iou_threshold
self.max_proposals_num = max_proposals_num
self.rpn_iou_positive_threshold = rpn_iou_positive_threshold
self.rpn_iou_negative_threshold = rpn_iou_negative_threshold
self.rpn_mini_batch_size = rpn_mini_batch_size
self.rpn_positives_ratio = rpn_positives_ratio
self.remove_outside_anchors = remove_outside_anchors
self.rpn_weight_decay = rpn_weight_decay
self.is_training = is_training
# 由基础网络生成的所有的 feature map
self.share_net = share_net
# 获取feature_maps
self.feature_maps_dict = self.get_feature_maps()
# 构建特征金字塔
self.feature_pyramid = self.build_feature_pyramid()
# 对所有的feature map进行box和classes(背景二分类预测)
self.anchors, self.rpn_encode_boxes, self.rpn_scores = self.get_anchors_and_rpn_predict()
上面个就是RPN的初始化函数,在这里的前面是获取参数,在init后跟了三个函数
-
获取feature_maps
self.feature_maps_dict = self.get_feature_maps()
def get_feature_maps(self):
with tf.variable_scope('get_feature_maps'):
if self.net_name == 'resnet_v1_50':
feature_maps_dict = {
'C2': self.share_net['resnet_v1_50/block1/unit_2/bottleneck_v1'], # [56, 56]
'C3': self.share_net['resnet_v1_50/block2/unit_3/bottleneck_v1'], # [28, 28]
'C4': self.share_net['resnet_v1_50/block3/unit_5/bottleneck_v1'], # [14, 14]
'C5': self.share_net['resnet_v1_50/block4'] # [7, 7]
}
elif self.net_name == 'resnet_v1_101':
feature_maps_dict = {
'C2': self.share_net['resnet_v1_101/block1/unit_2/bottleneck_v1'], # [56, 56]
'C3': self.share_net['resnet_v1_101/block2/unit_3/bottleneck_v1'], # [28, 28]
'C4': self.share_net['resnet_v1_101/block3/unit_22/bottleneck_v1'], # [14, 14]
'C5': self.share_net['resnet_v1_101/block4'] # [7, 7]
}
else:
raise Exception('get no feature maps')
return feature_maps_dict
这个函数浅显易懂,就是拿到基础网络的三个feature map层。进行返回
- 构建特征金字塔build_feature_pyramid
def build_feature_pyramid(self):
'''
reference: https://github.com/CharlesShang/FastMaskRCNN
build P2, P3, P4, P5
:return: multi-scale feature map
'''
feature_pyramid = {}
with tf.variable_scope('build_feature_pyramid'):
with slim.arg_scope([slim.conv2d], weights_regularizer=slim.l2_regularizer(self.rpn_weight_decay)):
feature_pyramid['P5'] = slim.conv2d(self.feature_maps_dict['C5'],
num_outputs=256,
kernel_size=[1, 1],
stride=1,
scope='build_P5')
feature_pyramid['P6'] = slim.max_pool2d(feature_pyramid['P5'],
kernel_size=[2, 2], stride=2, scope='build_P6')
# P6 is down sample of P5
#
for layer in range(4, 1, -1):
p, c = feature_pyramid['P' + str(layer + 1)], self.feature_maps_dict['C' + str(layer)]
up_sample_shape = tf.shape(c)
# 进行图片的resize,这是插值吗???
up_sample = tf.image.resize_nearest_neighbor(p, [up_sample_shape[1], up_sample_shape[2]],
name='build_P%d/up_sample_nearest_neighbor' % layer)
c = slim.conv2d(c, num_outputs=256, kernel_size=[1, 1], stride=1,
scope='build_P%d/reduce_dimension' % layer)
# 直接相加??
p = up_sample + c
# 进行卷积的编码
p = slim.conv2d(p, 256, kernel_size=[3, 3], stride=1,
padding='SAME', scope='build_P%d/avoid_aliasing' % layer)
feature_pyramid['P' + str(layer)] = p
return feature_pyramid
这里的流程比较简单,会产生P2, P3, P4, P5,P6这几个层.
为了防止忘记,记录一下流程:
- P6是P5的池化层
- P5是C5进行卷积产生的
- P4 是 P5 +C4合成的
- P3 是P4 +C3合成
- P2 是P3 +C2合成
接下就是进行box坐标和classes(二分类)的预测了。
- get_anchors_and_rpn_predict
def get_anchors_and_rpn_predict(self):
# 对所有的feature map产生固定box
anchors = self.make_anchors()
# 对所vfeature map预测box和背景(二分类)
rpn_encode_boxes, rpn_scores = self.rpn_net()
with tf.name_scope('get_anchors_and_rpn_predict'):
if self.is_training:
if self.remove_outside_anchors:
# 过滤box进行过滤,删除超出image边界的box,产生index
valid_indices = boxes_utils.filter_outside_boxes(boxes=anchors,
img_h=tf.shape(self.img_batch)[1],
img_w=tf.shape(self.img_batch)[2])
# 直接进行过滤
valid_anchors = tf.gather(anchors, valid_indices)
rpn_valid_encode_boxes = tf.gather(rpn_encode_boxes, valid_indices)
rpn_valid_scores = tf.gather(rpn_scores, valid_indices)
return valid_anchors, rpn_valid_encode_boxes, rpn_valid_scores
else:
return anchors, rpn_encode_boxes, rpn_scores
else:
return anchors, rpn_encode_boxes, rpn_scores
从上面的可以看出,这里就是进行最后的RPN的预测的一个函数。
这里有三个函数
- anchors = self.make_anchors()
- rpn_encode_boxes, rpn_scores = self.rpn_net()
- boxes_utils.filter_outside_boxes
make_anchors
def make_anchors(self):
with tf.variable_scope('make_anchors'):
anchor_list = []
level_list = self.level
with tf.name_scope('make_anchors_all_level'):
for level, base_anchor_size, stride in zip(level_list, self.base_anchor_size_list, self.stride):
'''
(level, base_anchor_size) tuple:
(P2, 32), (P3, 64), (P4, 128), (P5, 256), (P6, 512)
'''
featuremap_height, featuremap_width = tf.shape(self.feature_pyramid[level])[1], \
tf.shape(self.feature_pyramid[level])[2]
# stride = base_anchor_size / 8.
# tmp_anchors = tf.py_func(
# anchor_utils_pyfunc.make_anchors,
# inp=[base_anchor_size, self.anchor_scales, self.anchor_ratios,
# featuremap_height, featuremap_width, stride],
# Tout=tf.float32
# )
# 这里只需需要输入 feature map的size和窗口的尺寸就可以对没个feature map生成固定的box
tmp_anchors = make_anchor.make_anchors(base_anchor_size, self.anchor_scales, self.anchor_ratios,
featuremap_height, featuremap_width, stride,
name='make_anchors_{}'.format(level))
tmp_anchors = tf.reshape(tmp_anchors, [-1, 4])
anchor_list.append(tmp_anchors)
all_level_anchors = tf.concat(anchor_list, axis=0)
# 返回所有的box的坐标
return all_level_anchors
这里的make_anchor其实是在输入feature map尺寸之后就会生成固定的box。怎么可以在我以前的文章中看看点这里,就不进行解释了。
- self.rpn_net()
def rpn_net(self):
rpn_encode_boxes_list = []
rpn_scores_list = []
with tf.variable_scope('rpn_net'):
with slim.arg_scope([slim.conv2d], weights_regularizer=slim.l2_regularizer(self.rpn_weight_decay)):
for level in self.level:
if self.share_head:
reuse_flag = None if level == 'P2' else True
scope_list = ['conv2d_3x3', 'rpn_classifier', 'rpn_regressor']
# in the begining, we should create variables, then sharing variables in P3, P4, P5
# 这里提到这个的rpn_net对所有的feature map进行box预测是可以进行网络共享的
else:
reuse_flag = None
scope_list = ['conv2d_3x3_'+level, 'rpn_classifier_'+level, 'rpn_regressor_'+level]
# 先进行一次卷积操作
rpn_conv2d_3x3 = slim.conv2d(inputs=self.feature_pyramid[level],
num_outputs=512,
kernel_size=[3, 3],
stride=1,
scope=scope_list[0],
reuse=reuse_flag)
# 得到是否是背景的二分类scoress
rpn_box_scores = slim.conv2d(rpn_conv2d_3x3,
num_outputs=2 * self.num_of_anchors_per_location,
kernel_size=[1, 1],
stride=1,
scope=scope_list[1],
activation_fn=None,
reuse=reuse_flag)
# 得到boxs的坐标
rpn_encode_boxes = slim.conv2d(rpn_conv2d_3x3,
num_outputs=4 * self.num_of_anchors_per_location,
kernel_size=[1, 1],
stride=1,
scope=scope_list[2],
activation_fn=None,
reuse=reuse_flag)
rpn_box_scores = tf.reshape(rpn_box_scores, [-1, 2])
rpn_encode_boxes = tf.reshape(rpn_encode_boxes, [-1, 4])
rpn_scores_list.append(rpn_box_scores)
rpn_encode_boxes_list.append(rpn_encode_boxes)
rpn_all_encode_boxes = tf.concat(rpn_encode_boxes_list, axis=0)
rpn_all_boxes_scores = tf.concat(rpn_scores_list, axis=0)
return rpn_all_encode_boxes, rpn_all_boxes_scores
这里就比较简单了,在这里生成rpn_all_encode_boxes :生成box的坐标
rpn_all_boxes_scores :生成box的坐标
又遗忘了一个重要的部分
每次都会遗忘RPN部分中的固定box的坐标和RPN网路预测的box坐标有什么关系。看这里
bbox_transform_inv函数结合RPN的输出对所有初始框进行了坐标变换
也是接下来一个重要的部分
* 就是rpn.rpn_proposals()
def rpn_proposals(self):
with tf.variable_scope('rpn_proposals'):
# 对RPB预测的box坐标,结果anchors进行编码,这里进行解码
rpn_decode_boxes = encode_and_decode.decode_boxes(encode_boxes=self.rpn_encode_boxes,
reference_boxes=self.anchors,
scale_factors=self.scale_factors)
if not self.is_training: # when test, clip proposals to img boundaries
img_shape = tf.shape(self.img_batch)
# 对预测的坐标进行裁减
rpn_decode_boxes = boxes_utils.clip_boxes_to_img_boundaries(rpn_decode_boxes, img_shape)
rpn_softmax_scores = slim.softmax(self.rpn_scores)
rpn_object_score = rpn_softmax_scores[:, 1] # second column represent object
# 提取前top_k box
if self.top_k_nms:
rpn_object_score, top_k_indices = tf.nn.top_k(rpn_object_score, k=self.top_k_nms)
rpn_decode_boxes = tf.gather(rpn_decode_boxes, top_k_indices)
# 进行nms去重
valid_indices = nms.non_maximal_suppression(boxes=rpn_decode_boxes,
scores=rpn_object_score,
max_output_size=self.max_proposals_num,
iou_threshold=self.rpn_nms_iou_threshold)
valid_boxes = tf.gather(rpn_decode_boxes, valid_indices)
valid_scores = tf.gather(rpn_object_score, valid_indices)
rpn_proposals_boxes, rpn_proposals_scores = tf.cond(
tf.less(tf.shape(valid_boxes)[0], self.max_proposals_num),
lambda: boxes_utils.padd_boxes_with_zeros(valid_boxes, valid_scores,
self.max_proposals_num),
lambda: (valid_boxes, valid_scores))
上面进行了box的解码,去重,裁剪,提取。
贴一张不知道是否符合的编码公式(faster r-cnn)
- 构建 rpn_losses
并未仔细研究,放在这里占一个坑。
def rpn_losses(self):
with tf.variable_scope('rpn_losses'):
minibatch_indices, minibatch_anchor_matched_gtboxes, object_mask, minibatch_labels_one_hot = \
self.make_minibatch(self.anchors)
minibatch_anchors = tf.gather(self.anchors, minibatch_indices)
minibatch_encode_boxes = tf.gather(self.rpn_encode_boxes, minibatch_indices)
minibatch_boxes_scores = tf.gather(self.rpn_scores, minibatch_indices)
# encode gtboxes
minibatch_encode_gtboxes = encode_and_decode.encode_boxes(unencode_boxes=minibatch_anchor_matched_gtboxes,
reference_boxes=minibatch_anchors,
scale_factors=self.scale_factors)
positive_anchors_in_img = draw_box_with_color(self.img_batch,
minibatch_anchors *
tf.expand_dims(object_mask, 1),
text=tf.shape(tf.where(tf.equal(object_mask, 1.0)))[0])
negative_mask = tf.cast(tf.logical_not(tf.cast(object_mask, tf.bool)), tf.float32)
negative_anchors_in_img = draw_box_with_color(self.img_batch,
minibatch_anchors *
tf.expand_dims(negative_mask, 1),
text=tf.shape(tf.where(tf.equal(object_mask, 0.0)))[0])
minibatch_decode_boxes = encode_and_decode.decode_boxes(encode_boxes=minibatch_encode_boxes,
reference_boxes=minibatch_anchors,
scale_factors=self.scale_factors)
tf.summary.image('/positive_anchors', positive_anchors_in_img)
tf.summary.image('/negative_anchors', negative_anchors_in_img)
top_k_scores, top_k_indices = tf.nn.top_k(minibatch_boxes_scores[:, 1], k=1)
top_detections_in_img = draw_box_with_color(self.img_batch,
tf.gather(minibatch_decode_boxes, top_k_indices),
text=tf.shape(top_k_scores)[0])
tf.summary.image('/top_1', top_detections_in_img)
# losses
with tf.variable_scope('rpn_location_loss'):
location_loss = losses.l1_smooth_losses(predict_boxes=minibatch_encode_boxes,
gtboxes=minibatch_encode_gtboxes,
object_weights=object_mask)
tf.losses.add_loss(location_loss) # add smooth l1 loss to losses collection
with tf.variable_scope('rpn_classification_loss'):
classification_loss = tf.losses.softmax_cross_entropy(logits=minibatch_boxes_scores,
onehot_labels=minibatch_labels_one_hot)
return location_loss, classification_loss
接下来就是构建最后的目标检测
build_fast_rcnn.py
# build_fast_rcnn.py
class FastRCNN(object):
def __init__(self,
feature_pyramid, # rpn.feature_pyramid
rpn_proposals_boxes, #rpn_proposals_boxes
rpn_proposals_scores, # rpn_proposals_scores
img_batch, # imag
img_shape,# tf.shape(img_batch)
roi_size, # 14
scale_factors, # SCALE_FACTORS = [10., 10., 5., 5.]
roi_pool_kernel_size, # roi size = initial_crop_size / max_pool_kernel size
gtboxes_and_label, # [M, 5] 计算 loss
fast_rcnn_nms_iou_threshold, #
fast_rcnn_maximum_boxes_per_img, # 100
fast_rcnn_nms_max_boxes_per_class, # 100
show_detections_score_threshold, # show box scores larger than this threshold
num_classes, # exclude background
fast_rcnn_minibatch_size,# 256
fast_rcnn_positives_ratio, #0.5
fast_rcnn_positives_iou_threshold,#0.2
use_dropout,
is_training,
weight_decay,
level): #['P2', 'P3', 'P4', 'P5', "P6"]
self.feature_pyramid = feature_pyramid
self.rpn_proposals_boxes = rpn_proposals_boxes # [N, 4]
self.rpn_proposals_scores = rpn_proposals_scores
self.img_shape = img_shape
self.img_batch = img_batch
self.roi_size = roi_size
self.roi_pool_kernel_size = roi_pool_kernel_size
self.level = level
self.min_level = int(level[0][1])
self.max_level = min(int(level[-1][1]), 5)
self.fast_rcnn_nms_iou_threshold = fast_rcnn_nms_iou_threshold
self.fast_rcnn_nms_max_boxes_per_class = fast_rcnn_nms_max_boxes_per_class
self.fast_rcnn_maximum_boxes_per_img = fast_rcnn_maximum_boxes_per_img
self.show_detections_score_threshold = show_detections_score_threshold
self.scale_factors = scale_factors
# larger than 0.5 is positive, others are negative
self.fast_rcnn_positives_iou_threshold = fast_rcnn_positives_iou_threshold
self.fast_rcnn_minibatch_size = fast_rcnn_minibatch_size
self.fast_rcnn_positives_ratio = fast_rcnn_positives_ratio
self.gtboxes_and_label = gtboxes_and_label
self.num_classes = num_classes
self.use_dropout = use_dropout
self.is_training = is_training
self.weight_decay = weight_decay
# 生成rois层
self.fast_rcnn_all_level_rois, self.fast_rcnn_all_level_proposals = self.get_rois()
# 对rois层进行卷积操作,的到最后的box坐标和classes分类
self.fast_rcnn_encode_boxes, self.fast_rcnn_scores = self.fast_rcnn_net()
上面就是进行FPN检测的模块init函数,最后两个函数就把所有的检测结果算出来了
就是这两个函数
# 生成rois层
self.fast_rcnn_all_level_rois, self.fast_rcnn_all_level_proposals = self.get_rois()
# 对rois层进行卷积操作,的到最后的box坐标和classes分类
self.fast_rcnn_encode_boxes, self.fast_rcnn_scores = self.fast_rcnn_net()
- self.get_rois()
def get_rois(self):
'''
1)get roi from feature map
2)roi align or roi pooling. Here is roi align
:return:
all_level_rois: [N, 7, 7, C]
all_level_proposals : [N, 4]
all_level_proposals is matched with all_level_rois
'''
levels = self.assign_level()
all_level_roi_list = []
all_level_proposal_list = []
if DEBUG:
print_tensors(levels, 'levels')
with tf.variable_scope('fast_rcnn_roi'):
# P6 is not used by the Fast R-CNN detector.
for i in range(self.min_level, self.max_level + 1):
level_i_proposal_indices = tf.reshape(tf.where(tf.equal(levels, i)), [-1])
level_i_proposals = tf.gather(self.rpn_proposals_boxes, level_i_proposal_indices)
level_i_proposals = tf.cond(
tf.equal(tf.shape(level_i_proposals)[0], 0),
lambda: tf.constant([[0, 0, 0, 0]], dtype=tf.float32),
lambda: level_i_proposals
) # to avoid level_i_proposals batch is 0, or it will broken when gradient BP
all_level_proposal_list.append(level_i_proposals)
ymin, xmin, ymax, xmax = tf.unstack(level_i_proposals, axis=1)
img_h, img_w = tf.cast(self.img_shape[1], tf.float32), tf.cast(
self.img_shape[2], tf.float32)
normalize_ymin = ymin / img_h
normalize_xmin = xmin / img_w
normalize_ymax = ymax / img_h
normalize_xmax = xmax / img_w
# 从feature map中把box对应的部分剪裁出来
level_i_cropped_rois = tf.image.crop_and_resize(self.feature_pyramid['P%d' % i],
boxes=tf.transpose(tf.stack([normalize_ymin, normalize_xmin,
normalize_ymax, normalize_xmax])),
box_ind=tf.zeros(shape=[tf.shape(level_i_proposals)[0], ],
dtype=tf.int32),
crop_size=[self.roi_size, self.roi_size]
)
# 对这里的roi进行卷积操作
level_i_rois = slim.max_pool2d(level_i_cropped_rois,
[self.roi_pool_kernel_size, self.roi_pool_kernel_size],
stride=self.roi_pool_kernel_size)
all_level_roi_list.append(level_i_rois)
all_level_rois = tf.concat(all_level_roi_list, axis=0)
all_level_proposals = tf.concat(all_level_proposal_list, axis=0)
# all_level_rois:这里的rois是最后的rois层
# all_level_proposals:这里的proposals就是box的坐标
return all_level_rois, all_level_proposals
上面的rois其实就是从feature map裁剪出来的,这里感觉没什么好讲的,这里出现了一个不同点。如下
论文里出现了这个:
def assign_level(self):
with tf.name_scope('assign_levels'):
ymin, xmin, ymax, xmax = tf.unstack(self.rpn_proposals_boxes, axis=1)
w = tf.maximum(xmax - xmin, 0.) # avoid w is negative
h = tf.maximum(ymax - ymin, 0.) # avoid h is negative
levels = tf.round(4. + tf.log(tf.sqrt(w*h + 1e-8)/224.0) / tf.log(2.)) # 4 + log_2(***)
levels = tf.maximum(levels, tf.ones_like(levels) *
(np.float32(self.min_level))) # level minimum is 2
levels = tf.minimum(levels, tf.ones_like(levels) *
(np.float32(self.max_level))) # level maximum is 5
return tf.cast(levels, tf.int32)
接下来就是进行box的预测了。
* fast_rcnn_net
def fast_rcnn_net(self):
with tf.variable_scope('fast_rcnn_net'):
with slim.arg_scope([slim.fully_connected], weights_regularizer=slim.l2_regularizer(self.weight_decay)):
flatten_rois_features = slim.flatten(self.fast_rcnn_all_level_rois)
net = slim.fully_connected(flatten_rois_features, 1024, scope='fc_1')
if self.use_dropout:
net = slim.dropout(net, keep_prob=0.5, is_training=self.is_training, scope='dropout')
net = slim.fully_connected(net, 1024, scope='fc_2')
# 对每个rois的到self.num_classes + 1个类
fast_rcnn_scores = slim.fully_connected(net, self.num_classes + 1, activation_fn=None,
scope='classifier')
# 对每个rois得到self.num_classes * 4的坐标
fast_rcnn_encode_boxes = slim.fully_connected(net, self.num_classes * 4, activation_fn=None,
scope='regressor')
if DEBUG:
print_tensors(fast_rcnn_encode_boxes, 'fast_rcnn_encode_bxes')
return fast_rcnn_encode_boxes, fast_rcnn_scores
这里没使用for 循环,而是使用slim的slim.fully_connected对rois进行预测。应该是对每个rois进行单独预测。
fast_rcnn_predict
def fast_rcnn_predict(self):
with tf.variable_scope('fast_rcnn_predict'):
fast_rcnn_softmax_scores = slim.softmax(self.fast_rcnn_scores) # [-1, num_classes+1]
fast_rcnn_encode_boxes = tf.reshape(self.fast_rcnn_encode_boxes, [-1, 4])
reference_boxes = tf.tile(self.fast_rcnn_all_level_proposals, [
1, self.num_classes]) # [N, 4*num_classes]
reference_boxes = tf.reshape(reference_boxes, [-1, 4]) # [N*num_classes, 4]
fast_rcnn_decode_boxes = encode_and_decode.decode_boxes(encode_boxes=fast_rcnn_encode_boxes,
reference_boxes=reference_boxes,
scale_factors=self.scale_factors)
fast_rcnn_decode_boxes = boxes_utils.clip_boxes_to_img_boundaries(fast_rcnn_decode_boxes,
img_shape=self.img_shape)
# mutilclass NMS
fast_rcnn_decode_boxes = tf.reshape(fast_rcnn_decode_boxes, [-1, self.num_classes*4])
fast_rcnn_decode_boxes, fast_rcnn_score, num_of_objects, detection_category = \
self.fast_rcnn_proposals(fast_rcnn_decode_boxes, scores=fast_rcnn_softmax_scores)
return fast_rcnn_decode_boxes, fast_rcnn_score, num_of_objects, detection_category
这里fast_rcnn_predict其实就是对fast_rcnn_proposals产生的结果进行softmax操作,并把box进行解码,同时把超出图片边界的box进行剪裁
- fast_rcnn loss
def fast_rcnn_loss(self):
with tf.variable_scope('fast_rcnn_loss'):
minibatch_indices, minibatch_reference_boxes_mattached_gtboxes, minibatch_object_mask, \
minibatch_label_one_hot = self.fast_rcnn_minibatch(self.fast_rcnn_all_level_proposals)
minibatch_reference_boxes = tf.gather(self.fast_rcnn_all_level_proposals, minibatch_indices)
minibatch_encode_boxes = tf.gather(self.fast_rcnn_encode_boxes,
minibatch_indices) # [minibatch_size, num_classes*4]
minibatch_scores = tf.gather(self.fast_rcnn_scores, minibatch_indices)
positive_proposals_in_img = draw_box_with_color(self.img_batch,
minibatch_reference_boxes * tf.expand_dims(
minibatch_object_mask, 1),
text=tf.shape(tf.where(tf.equal(minibatch_object_mask, 1.0)))[0])
negative_mask = tf.cast(tf.logical_not(tf.cast(minibatch_object_mask, tf.bool)), tf.float32)
negative_proposals_in_img = draw_box_with_color(self.img_batch,
minibatch_reference_boxes * tf.expand_dims(negative_mask,
1),
text=tf.shape(tf.where(tf.equal(minibatch_object_mask, 0.0)))[0])
tf.summary.image('/positive_proposals', positive_proposals_in_img)
tf.summary.image('/negative_proposals', negative_proposals_in_img)
if cfgs.CLASS_NUM == 1:
minibatch_decode_boxes = encode_and_decode.decode_boxes(encode_boxes=minibatch_encode_boxes,
reference_boxes=minibatch_reference_boxes,
scale_factors=self.scale_factors)
minibatch_softmax_scores = tf.gather(slim.softmax(self.fast_rcnn_scores), minibatch_indices)
top_k_scores, top_k_indices = tf.nn.top_k(minibatch_softmax_scores[:, 1], k=5)
top_detections_in_img = draw_boxes_with_scores(self.img_batch,
boxes=tf.gather(
minibatch_decode_boxes, top_k_indices),
scores=top_k_scores)
tf.summary.image('/top_5', top_detections_in_img)
# encode gtboxes
minibatch_encode_gtboxes = \
encode_and_decode.encode_boxes(
unencode_boxes=minibatch_reference_boxes_mattached_gtboxes,
reference_boxes=minibatch_reference_boxes,
scale_factors=self.scale_factors
)
# [minibatch_size, num_classes*4]
minibatch_encode_gtboxes = tf.tile(minibatch_encode_gtboxes, [1, self.num_classes])
class_weights_list = []
category_list = tf.unstack(minibatch_label_one_hot, axis=1)
for i in range(1, self.num_classes+1):
tmp_class_weights = tf.ones(
shape=[tf.shape(minibatch_encode_boxes)[0], 4], dtype=tf.float32)
tmp_class_weights = tmp_class_weights * tf.expand_dims(category_list[i], axis=1)
class_weights_list.append(tmp_class_weights)
class_weights = tf.concat(class_weights_list, axis=1) # [minibatch_size, num_classes*4]
# loss
with tf.variable_scope('fast_rcnn_classification_loss'):
fast_rcnn_classification_loss = tf.losses.softmax_cross_entropy(logits=minibatch_scores,
onehot_labels=minibatch_label_one_hot)
with tf.variable_scope('fast_rcnn_location_loss'):
fast_rcnn_location_loss = losses.l1_smooth_losses(predict_boxes=minibatch_encode_boxes,
gtboxes=minibatch_encode_gtboxes,
object_weights=minibatch_object_mask,
classes_weights=class_weights)
tf.losses.add_loss(fast_rcnn_location_loss)
return fast_rcnn_location_loss, fast_rcnn_classification_loss
对坐标和分类进行loss计算。
发现这里的trian.py文件写的比较清晰
trian.py
这里就是文件的主要结构
def train():
with tf.Graph().as_default():
with tf.name_scope('get_batch'):
# 读取数据
img_name_batch, img_batch, gtboxes_and_label_batch, num_objects_batch = next_batch()
# ***********************************************************************************************
# * share net 构建基础网络 *
# ***********************************************************************************************
_, share_net = get_network_byname()
# ***********************************************************************************************
# * rpn 构建RPN网络 *
# ***********************************************************************************************
rpn = build_rpn.RPN()
# ***********************************************************************************************
# * Fast RCNN 构建检测模型 *
# ***********************************************************************************************
fast_rcnn = build_fast_rcnn.FastRCNN()
fast_rcnn_decode_boxes, fast_rcnn_score, num_of_objects, detection_category = \
fast_rcnn.fast_rcnn_predict()
# ***********************************************************************************************
# * loss 计算RPN和fast_rcnn的loss *
# ***********************************************************************************************
fast_rcnn_location_loss, fast_rcnn_classification_loss = fast_rcnn.fast_rcnn_loss()
fast_rcnn_total_loss = fast_rcnn_location_loss + fast_rcnn_classification_loss
# train
added_loss = rpn_total_loss + fast_rcnn_total_loss
total_loss = tf.losses.get_total_loss()
global_step = tf.train.get_or_create_global_step()
lr = tf.train.piecewise_constant(global_step,
boundaries=[np.int64(20000), np.int64(40000)],
values=[cfgs.LR, cfgs.LR / 10, cfgs.LR / 100])
tf.summary.scalar('lr', lr)
optimizer = tf.train.MomentumOptimizer(lr, momentum=cfgs.MOMENTUM)
train_op = slim.learning.create_train_op(total_loss, optimizer, global_step) # rpn_total_loss,
# train_op = optimizer.minimize(second_classification_loss, global_step)
这就是FPN的大体结构,这里还有坑需要补。。。
更新 已经跑过代码
代码填坑
参考:
使用TensorFlow训练综合FPN和Faster-RCNN的目标检测模型
目标检测算法之——FPN(Feature Pyramid Networks