声明:本文禁止转载
ssd框架代码:research/object_detection/meta_architectures/ssd_meta_arch.py
该文件负责ssd框架的定义,后续基于该代码进行展开说明各个模块的具体实现
ssd配置文件:research/object_detection/samples/configs/ssd_mobilenet_v1_coco.config
该文件是models的配置文件,选取mobilenet_v1作为特征提取网络,后续基于这个配置文件展开描述
下面通过1-4说明predict的实现过程,1-3是相关模块,4是meta_arch中的实现流程。
research/object_detection/models/ssd_mobilenet_v1_feature_extractor.py
feature_map_layout = {
'from_layer': ['Conv2d_11_pointwise', 'Conv2d_13_pointwise', '', '',
'', ''],
'layer_depth': [-1, -1, 512, 256, 256, 128],
'use_explicit_padding': self._use_explicit_padding,
'use_depthwise': self._use_depthwise,
}
‘from_layer’:从mobilenet中提取的namescop的名称,如果为空则表示新生成的。其value为list,长度为6,也就是6个不同尺度的featuremap。
‘layer_depth’:表示channel的深度,为空,表示从原有的net中继承。
‘use_explicit_padding’:如果使能,选择valid pading,并在valid padding之前先做一次fixed padding,其目的是为了让经过卷积后的size与使用same padding的大小一致。
使用same padding与fixed padding的差异,参考stackover flow
使用sampe padding时:
Case 1:
pad| |pad
inputs: 0 |1 2 3 4 5 |0
|_______|
|_______|
|_______|
Case 2:
|pad
inputs: 1 2 3 4 5 6 |0
|_______|
|_______|
|_______|
使用fixed padding
Case 1:
pad| |pad
inputs: 0 |1 2 3 4 5 |0
|_______|
|_______|
|_______|
Case 2:
pad| |pad
inputs: 0 |1 2 3 4 5 6 |0
|_______|
|_______|
|_______|
research/object_detection/models/feature_map_generators.py.py,multi_resolution_feature_maps()
name scope | channel depth | feature map size |
---|---|---|
Conv2d_11_pointwise | 512 | 19x19 |
Conv2d_13_pointwise | 1024 | 10x10 |
Conv2d_13_pointwise_2_Conv2d_2_3x3_s2_512 | 512 | 5x5 |
Conv2d_13_pointwise_2_Conv2d_3_3x3_s2_256 | 256 | 3x3 |
Conv2d_13_pointwise_2_Conv2d_4_3x3_s2_256 | 256 | 2x2 |
Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128 | 128 | 1x1 |
models/research/object_detection/anchor_generators/multiple_grid_anchor_generator.py
anchor_generator {
ssd_anchor_generator {
num_layers: 6
min_scale: 0.2
max_scale: 0.95
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
aspect_ratios: 3.0
aspect_ratios: 0.3333
}
anchor生成中有这个限制条件没弄明白,为什么需要feature map的数量与每个位置anchor的数量相同?
if self.check_num_anchors and (
len(feature_map_shape_list) != len(self.num_anchors_per_location())):
raise ValueError('Number of feature maps is expected to equal the length '
'of `num_anchors_per_location`.')
research/object_detection/anchor_generators/multiple_grid_anchor_generator.py
create_ssd_anchors,返回一个MultipleGridAnchorGenerator对象。主要实现box_specs_list
for layer, scale, scale_next in zip(range(num_layers), scales[:-1], scales[1:]):
layer_box_specs = []
if layer == 0 and reduce_boxes_in_lowest_layer:
layer_box_specs = [(0.1, 1.0), (scale, 2.0), (scale, 0.5)]
else:
for aspect_ratio in aspect_ratios:
layer_box_specs.append((scale, aspect_ratio))
# Add one more anchor, with a scale between the current scale, and the
# scale for the next layer, with a specified aspect ratio (1.0 by
# default).
if interpolated_scale_aspect_ratio > 0.0:
layer_box_specs.append((np.sqrt(scale*scale_next), interpolated_scale_aspect_ratio))
box_specs_list.append(layer_box_specs)
其中’reduce_boxes_in_lowest_layer’如果值为’True’,第0个feature map(size = 19x19)的anchor参数则使用’layer_box_specs’的配置,否则使用默认配置。从上面可以看出,这个参数使能后,可以在第0层的feature map上减少anchor的数量。 这样做的原因应该是,其一,第0个feature map的size本身较大,对应到原图的精度也还不错,因此更多的anchor带来的精度提升不明显;其二,可以减小运算量。
根据配置最终的 box_specs_list 为:
box_specs_list = [[(0.1,1),(0.2,2),(0.2,0.5)],
[(0.35,1), (0.35,2), (0.35,0.5), (0.35,3), (0.35,0.333), (sqrt(0.35*0.50),1)],
[(0.50,1), (0.50,2), (0.50,0.5), (0.50,3), (0.50,0.333), (sqrt(0.50*0.65),1)],
[(0.65,1), (0.65,2), (0.65,0.5), (0.65,3), (0.65,0.333), (sqrt(0.65*0.80),1)],
[(0.80,1), (0.80,2), (0.80,0.5), (0.80,3), (0.80,0.333), (sqrt(0.80*0.95),1)],
[(0.95,1), (0.95,2), (0.95,0.5), (0.95,3), (0.95,0.333), (sqrt(0.95*1.00),1)]]
research/object_detection/anchor_generators/grid_anchor_generator.py,tile_anchors() 函数。
输入变量为每层的feature map size,scales,aspect_ratios等参数,返回的变量为一个BoxList对象。
# 第一个feature map,传入参数如下
# grid_height = grid_width = 19
# aspect_ratios = [0.1,0.2,0.2]
# scales = [1,2,0.5]
# anchor_stride = (1/19,1/19)
# anchor_offset = (1/38,1/38)
ratio_sqrts = tf.sqrt(aspect_ratios)
# 计算宽高
heights = scales / ratio_sqrts * base_anchor_size[0]
widths = scales * ratio_sqrts * base_anchor_size[1]
# Get a grid of box centers
y_centers = tf.to_float(tf.range(grid_height))
y_centers = y_centers * anchor_stride[0] + anchor_offset[0]
x_centers = tf.to_float(tf.range(grid_width))
x_centers = x_centers * anchor_stride[1] + anchor_offset[1]
# x_centers = y_centers = [1/38,3/38,5/38,...,37/38]
x_centers, y_centers = ops.meshgrid(x_centers, y_centers)
# 获取x,y meshgrid
widths_grid, x_centers_grid = ops.meshgrid(widths, x_centers)
heights_grid, y_centers_grid = ops.meshgrid(heights, y_centers)
bbox_centers = tf.stack([y_centers_grid, x_centers_grid], axis=3)
bbox_sizes = tf.stack([heights_grid, widths_grid], axis=3)
bbox_centers = tf.reshape(bbox_centers, [-1, 2])
bbox_sizes = tf.reshape(bbox_sizes, [-1, 2])
bbox_corners = _center_size_bbox_to_corners_bbox(bbox_centers, bbox_sizes)
bbox_corners的size为[grid_height*grid_height*num_anchors_per_location,4]。
比如第0层的featuremap生成的bbox_corners的size为[19*19*3,4] = [1083,4]
box_predictor {
convolutional_box_predictor {
min_depth: 0
max_depth: 0
num_layers_before_predictor: 0
use_dropout: false
dropout_keep_probability: 0.8
kernel_size: 1
box_code_size: 4
apply_sigmoid_to_scores: false
}
}
research/object_detection/core/box_predictor.py
采用的方法为 convolutional_box_predictor,对应代码里的对象为ConvolutionalBoxPredictor,具体实现方法为_predict。
使用1x1的卷积得到合适size的featuremap。
# box_encodings 的size为 (BatchSize,featuremap_size,featuremap_size,4 * num_boxes)
box_encodings = slim.conv2d(
net, num_predictions_per_location * self._box_code_size,
[self._kernel_size, self._kernel_size],
scope='BoxEncodingPredictor')
# class_predictions_with_background的size为 (BatchSize,featuremap_size,featuremap_size,num_class_slots * num_boxes)
# num_class_slots = 实际分类数目 + 1,加入了是否是background的分类项
class_predictions_with_background = slim.conv2d(
net, num_predictions_per_location * num_class_slots,
[self._kernel_size, self._kernel_size],
scope='ClassPredictor',
biases_initializer=tf.constant_initializer(
self._class_prediction_bias_init))
# box_encodings reshape : [N,19,19,12] --> [N,1083,1,4] (第0层feature map)
# class_predictions_with_background, reshape : [N,19,19,3*classes] --> [N,1083,classes] (第0层feature map)
上面1,2,3介绍了主要功能的实现,ssd_meta_arch中依赖这些
主体代码:research/object_detection/meta_architectures/ssd_meta_arch.py,predict() 函数
with tf.variable_scope(None, self._extract_features_scope,[preprocessed_inputs]):
#获取feature maps
feature_maps = self._feature_extractor.extract_features(preprocessed_inputs)
#获取每层feature map的size
feature_map_spatial_dims = self._get_feature_map_spatial_dims(feature_maps)
#获取输入image的shape
image_shape = shape_utils.combined_static_and_dynamic_shape(preprocessed_inputs)
# 1.获取anchor, ssd_anchor_generator --> multiple_grid_anchor_generator -->grid_anchor_gerator
# 2.将box_lists转为concatenated_boxlist,也就是原来boxlist数量为featuremap的个数,现在为1个。
# featuremap的index和每个featuremap的boxes的数量记录在'feature_map_index'中,boxes记录在'boxes'中
self._anchors = box_list_ops.concatenate(
self._anchor_generator.generate(
feature_map_spatial_dims,
im_height=image_shape[1],
im_width=image_shape[2]))
设batch size = 12.
# multiple_grid_anchor_generator --> num_anchors_per_location 返回每个featuremap的每个位置boxes的数量,即 [3,6,6,6,6]
# box_predictor -- > ConvolutionalBoxPredictor
prediction_dict = self._box_predictor.predict(
feature_maps, self._anchor_generator.num_anchors_per_location())
# 新的 box_encodings 的size为[12, 1083+600+180+54+24+6, 4] --> [12, 2667, 4]
box_encodings = tf.squeeze(
tf.concat(prediction_dict['box_encodings'], axis=1), axis=2)
# 新的 class_predictions_with_background 的size为(12, 1083+600+180+54+24+6, 6) --> [12,2667,6]
class_predictions_with_background = tf.concat(
prediction_dict['class_predictions_with_background'], axis=1)
最终 box_encodings的大小为[12, 2667, 4],class_predictions_with_background大小为[12,2667,6]
matcher的功能是找到与groundtruth匹配的box的index。
在介绍target assigner之前需要先介绍IOU计算、matcher等功能
research/object_detection/core/box_list_ops.py,iou() 函数
def iou(boxlist1, boxlist2, scope=None):
"""Computes pairwise intersection-over-union between box collections.
Args:
boxlist1: BoxList holding N boxes
boxlist2: BoxList holding M boxes
scope: name scope.
Returns:
a tensor with shape [N, M] representing pairwise iou scores.
"""
with tf.name_scope(scope, 'IOU'):
intersections = intersection(boxlist1, boxlist2)
areas1 = area(boxlist1)
areas2 = area(boxlist2)
unions = (
tf.expand_dims(areas1, 1) + tf.expand_dims(areas2, 0) - intersections)
return tf.where(
tf.equal(intersections, 0.0),
tf.zeros_like(intersections), tf.truediv(intersections, unions))
代码在 research/object_detection/core/target_assigner.py的assign()函数中实现
_similarity_calc.compare()调用iou函数生成iou matrix
# 返回[num_groundtruth_boxes,num_anchors]大小的IOU矩阵
match_quality_matrix = self._similarity_calc.compare(groundtruth_boxes,anchors)
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
}
}
research/object_detection/matchers/argmax_matcher.py
主要实现在_match_when_rows_are_non_empty()中:
1.在每列上找最大的IOU,也即找每个anchor对应IOU最大的ground truth
matches = tf.argmax(similarity_matrix, 0, output_type=tf.int32)
matched_vals = tf.reduce_max(similarity_matrix, 0)
below_unmatched_threshold = tf.greater(self._unmatched_threshold,matched_vals)
between_thresholds = tf.logical_and(
tf.greater_equal(matched_vals, self._unmatched_threshold),
tf.greater(self._matched_threshold, matched_vals))
# 配置中 _negatives_lower_than_unmatched = True
if self._negatives_lower_than_unmatched:
# 将小于 _unmatched_threshold 的 index值设为-1
matches = self._set_values_using_indicator(matches,below_unmatched_threshold,-1)
# 将大于 _unmatched_threshold 且小于 _matched_threshold 的 index值设为-2
matches = self._set_values_using_indicator(matches,between_thresholds,-2)
else:
matches = self._set_values_using_indicator(matches,below_unmatched_threshold,-2)
matches = self._set_values_using_indicator(matches,between_thresholds,-1)
2. 在每行上找最大的IOU,也即找每个ground truth对应IOU最大的anchor
similarity_matrix_shape = shape_utils.combined_static_and_dynamic_shape(similarity_matrix)
force_match_column_ids = tf.argmax(similarity_matrix, 1,output_type=tf.int32)
# 对行上的index进行深度为anchors的数量的one hot展开
force_match_column_indicators = tf.one_hot(force_match_column_ids, depth=similarity_matrix_shape[1])
# force_match_row_ids 得到的index表示的实际意义为:ground truth和anchor互为最大值的index。
force_match_row_ids = tf.argmax(force_match_column_indicators, 0,output_type=tf.int32)
3. 最终返回的matches
# 转换为bool类型变量
force_match_column_mask = tf.cast(tf.reduce_max(force_match_column_indicators, 0), tf.bool)
# force_match_column_mask = True的位置 final_matches 为force_match_row_ids ,否则为 matches
# 优先选择互为最大值的 index
final_matches = tf.where(force_match_column_mask,force_match_row_ids, matches)
matches返回的是similarity_matrix中IOU最列大且大于iou_threshold的index。
research/object_detection/core/target_assigner.py
# 返回[num_groundtruth_boxes,num_anchors]大小的IOU矩阵
match_quality_matrix = self._similarity_calc.compare(groundtruth_boxes,
anchors)
# 返回 每个anchor 对应最大IOU的 groundtruth的index
match = self._matcher.match(match_quality_matrix, **params)
# 返回 matched anchors,其大小为[num_anchors,4],其中unmatched和ignore部分为[0,0,0,0]
reg_targets = self._create_regression_targets(anchors,groundtruth_boxes,match)
# 返回matched calsses,其中unmatched和ignore部分为0
cls_targets = self._create_classification_targets(groundtruth_labels,match)
reg_weights = self._create_regression_weights(match, groundtruth_weights)
cls_weights = self._create_classification_weights(match,groundtruth_weights)
_create_regression_targets 的核心功能是提取matched的box,用于后续计算loss。
_create_classification_targets 的功能与前者相同。
调用target_assigner.assign,对一个batch中每张照片进行处理,代码
research/object_detection/core/target_assigner.py ,batch_assign_targets()
# 获取每张图片的 box,class,match_list等信息。
# match_list:match.Match的list,记录每个anchor matched的groundtruth的index,其中unmatch的值为-1,ignore值为-2
# batch_reg_targets ,记录位置回归的信息,其中unmatch和ignore的值为[0,0,0,0]
# batch_cls_targets ,记录分类的结果的信息,其中unmatch和ignore的值为0
for anchors, gt_boxes, gt_class_targets, gt_weights in zip(
anchors_batch, gt_box_batch, gt_class_targets_batch, gt_weights_batch):
(cls_targets, cls_weights, reg_targets,reg_weights, match) = target_assigner.assign(
anchors, gt_boxes, gt_class_targets, gt_weights)
cls_targets_list.append(cls_targets)
cls_weights_list.append(cls_weights)
reg_targets_list.append(reg_targets)
reg_weights_list.append(reg_weights)
match_list.append(match)
batch_cls_targets = tf.stack(cls_targets_list)
batch_cls_weights = tf.stack(cls_weights_list)
batch_reg_targets = tf.stack(reg_targets_list)
batch_reg_weights = tf.stack(reg_weights_list)
在计算loss时,正负样本不均衡。因此在计算时,仅使用负样本中一部分样本来计算loss。hard_example_miner正是实现了该功能。
hard_example_miner {
# 最大输出样本数量
num_hard_examples: 3000
# NMS阈值
iou_threshold: 0.99
# 评价指标
loss_type: CLASSIFICATION
# 正负样本比例,负样本数量为正样本数量的3倍
max_negatives_per_positive: 3
# 最小负样本数量
min_negatives_per_image: 0
}
research/object_detection/core/losses.py
下面的代码是一张图片中的处理过程,HardExampleMiner.__caller__() 函数:
# NMS,非最大值抑制,根据class loss(image_losses)排序,然后剔除IOU大于threshold的box中class loss较小的box。
# 最大保存 num_hard_examples 个box。
selected_indices = tf.image.non_max_suppression(
box_locations, image_losses, num_hard_examples, self._iou_threshold)
# 根据 _max_negatives_per_positive 确定正负样本数目
if self._max_negatives_per_positive is not None and match:
(selected_indices, num_positives,
num_negatives) = self._subsample_selection_to_desired_neg_pos_ratio(
selected_indices, match, self._max_negatives_per_positive,
self._min_negatives_per_image)
# 记录每张图片中正负样本数目
num_positives_list.append(num_positives)
num_negatives_list.append(num_negatives)
#记录纳入统计的正负样本
mined_location_losses.append(
tf.reduce_sum(tf.gather(location_losses[ind], selected_indices)))
mined_cls_losses.append(
tf.reduce_sum(tf.gather(cls_losses[ind], selected_indices)))
处理完每张图片后,计算loss:
location_loss = tf.reduce_sum(tf.stack(mined_location_losses))
cls_loss = tf.reduce_sum(tf.stack(mined_cls_losses))
代码在ssd_meta_arch.py中,loss()
loss配置:
loss {
classification_loss {
weighted_sigmoid {
}
}
localization_loss {
weighted_smooth_l1 {
}
}
hard_example_miner {
num_hard_examples: 3000
iou_threshold: 0.99
loss_type: CLASSIFICATION
max_negatives_per_positive: 3
min_negatives_per_image: 0
}
classification_weight: 1.0
localization_weight: 1.0
}
match_list:matcher.Match的list,记录每个anchor matched的groundtruth的index,其中unmatch的值为-1,ignore值为-2
batch_reg_targets :记录位置回归的信息,其中unmatch和ignore的值为[0,0,0,0]
batch_cls_targets :记录分类的结果的信息,其中unmatch和ignore的值为0
(batch_cls_targets, batch_cls_weights, batch_reg_targets, batch_reg_weights, match_list) =
self._assign_targets(
self.groundtruth_lists(fields.BoxListFields.boxes),
self.groundtruth_lists(fields.BoxListFields.classes),
keypoints, weights)
# losses.WeightedSmoothL1LocalizationLoss , tf.losses.huber_loss
# __init__() , default : delta = 1
# __caller__() --> _compute_loss()
# prediction_dict['box_encodings'], 大小为 [12,2667,4]
# location_losses 返回值size为 [12,2667]
location_losses = self._localization_loss(
prediction_dict['box_encodings'],
batch_reg_targets,
ignore_nan_targets=True,
weights=batch_reg_weights)
# losses.WeightedSigmoidClassificationLoss , sigmoid_cross_entropy_with_logits
# 采用sigmoid而非softmax,logit为4和100得到的分类概率可能差不多
# __init()
# __caller__() --> _compute_loss()
# prediction_dict['class_predictions_with_background'], 大小为 [12,2667,6]
# cls_losses 返回值size为 [12,2667]
cls_losses = ops.reduce_sum_trailing_dimensions(
self._classification_loss(
prediction_dict['class_predictions_with_background'],
batch_cls_targets,
weights=batch_cls_weights),
ndims=2)
(localization_loss, classification_loss) = self._apply_hard_mining(
location_losses, cls_losses, prediction_dict, match_list)
# 对loss进行归一化处理
# _normalize_loss_by_num_matches = True (in config file)
# _normalize_loc_loss_by_codesize = False (default in ssd.proto)
normalizer = tf.constant(1.0, dtype=tf.float32)
if self._normalize_loss_by_num_matches:
normalizer = tf.maximum(tf.to_float(tf.reduce_sum(batch_reg_weights)),
1.0)
localization_loss_normalizer = normalizer
if self._normalize_loc_loss_by_codesize:
localization_loss_normalizer *= self._box_coder.code_size
localization_loss = tf.multiply((self._localization_loss_weight /
localization_loss_normalizer),
localization_loss,
name='localization_loss')
classification_loss = tf.multiply((self._classification_loss_weight /
normalizer), classification_loss,
name='classification_loss')
在得到predict之后,最终的box还需要经过postprocess加工,得到最终的box
postprocess配置:
post_processing {
batch_non_max_suppression {
score_threshold: 1e-8
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SIGMOID
}
实现过程:
# 提取predict结果
preprocessed_images = prediction_dict['preprocessed_inputs']
box_encodings = prediction_dict['box_encodings']
class_predictions = prediction_dict['class_predictions_with_background']
detection_boxes, detection_keypoints = self._batch_decode(box_encodings)
detection_boxes = tf.expand_dims(detection_boxes, axis=2)
# sigmoid,tf.sigmoid(class_predictions/logit_scale),激活函数
# logit_scale = 1 (default value in post_processing.proto)
detection_scores_with_background = self._score_conversion_fn(
class_predictions)
detection_scores = tf.slice(detection_scores_with_background, [0, 0, 1],
[-1, -1, -1])
# 根据post_processing中nms的配置处理detection_scores和detection_boxes,得到最终的boxes和scores
# _non_max_suppression_fn = post_processing.batch_multiclass_non_max_suppression
(nmsed_boxes, nmsed_scores, nmsed_classes, _, nmsed_additional_fields,
num_detections) = self._non_max_suppression_fn(
detection_boxes,
detection_scores,
clip_window=self._compute_clip_window(
preprocessed_images, true_image_shapes),
additional_fields=additional_fields)