SSD tensorflow及源码详解

原文地址https://blog.csdn.net/c20081052/article/details/80391627SSD tensorflow及源码详解

本文主要针对SSD的tensorflow框架下的实现的源码解读即对网络模型的理解。

【前言】

首先在github上下载tensorflow版的SSD repository:https://github.com/balancap/SSD-Tensorflow

同时附上论文地址:SSD 论文下载

解压SSD-Tensorflow-master.zip 到自己工作目录下。

 

 

 

SSD直接采用卷积对不同的特征图来进行提取检测结果。对于形状为 的特征图,只需要采用 这样比较小的卷积核得到检测值;

SSD的检测值也与Yolo不太一样。对于每个单元的每个先验框,其都输出一套独立的检测值,对应一个边界框,主要分为两个部分。第一部分是各个类别的置信度或者评分,值得注意的是SSD将背景也当做了一个特殊的类别,如果检测目标共有 个类别,SSD其实需要预测 个置信度值,其中第一个置信度指的是不含目标或者属于背景的评分。后面当我们说 个类别置信度时,请记住里面包含背景那个特殊的类别,即真实的检测类别只有 个。在预测过程中,置信度最高的那个类别就是边界框所属的类别,特别地,当第一个置信度值最高时,表示边界框中并不包含目标。第二部分就是边界框的location,包含4个值 ,分别表示边界框的中心坐标以及宽高。但是真实预测值其实只是边界框相对于先验框的转换值(paper里面说是offset,但是觉得transformation更合适,参见R-CNN)。先验框位置用 表示,其对应边界框用 $表示,那么边界框的预测值 其实是 相对于 的转换值:

习惯上,我们称上面这个过程为边界框的编码(encode),预测时,你需要反向这个过程,即进行解码(decode),从预测值 中得到边界框的真实位置 :

然而,在SSD的Caffe源码实现中还有trick,那就是设置variance超参数来调整检测值,通过bool参数variance_encoded_in_target来控制两种模式,当其为True时,表示variance被包含在预测值中,就是上面那种情况。但是如果是False(大部分采用这种方式,训练更容易?),就需要手动设置超参数variance,用来对 的4个值进行放缩,此时边界框需要这样解码:

综上所述,对于一个大小 的特征图,共有 个单元,每个单元设置的先验框数目记为 ,那么每个单元共需要 个预测值,所有的单元共需要 个预测值,由于SSD采用卷积做检测,所以就需要 个卷积核完成这个特征图的检测过程。

 

VGG16中的Conv4_3层将作为用于检测的第一个特征图。conv4_3层特征图大小是 ,但是该层比较靠前,其norm较大,所以在其后面增加了一个L2 Normalization层(参见ParseNet),以保证和后面的检测层差异不是很大,这个和Batch Normalization层不太一样,其仅仅是对每个像素点在channle维度做归一化,而Batch Normalization层是在[batch_size, width, height]三个维度上做归一化。归一化后一般设置一个可训练的放缩变量gamma

 

 

默认情况下,每个特征图会有一个 且尺度为 的先验框,除此之外,还会设置一个尺度为 的先验框,这样每个特征图都设置了两个长宽比为1但大小不同的正方形先验框。注意最后一个特征图需要参考一个虚拟 来计算 。因此,每个特征图一共有 个先验框 ,但是在实现时,Conv4_3,Conv10_2和Conv11_2层仅使用4个先验框,它们不使用长宽比为 的先验框。每个单元的先验框的中心点分布在各个单元的中心,即 ,其中 为特征图的大小。

训练过程

(1)先验框匹配
在训练过程中,首先要确定训练图片中的ground truth(真实目标)与哪个先验框来进行匹配,与之匹配的先验框所对应的边界框将负责预测它。在Yolo中,ground truth的中心落在哪个单元格,该单元格中与其IOU最大的边界框负责预测它。但是在SSD中却完全不一样,SSD的先验框与ground truth的匹配原则主要有两点。首先,对于图片中每个ground truth,找到与其IOU最大的先验框,该先验框与其匹配,这样,可以保证每个ground truth一定与某个先验框匹配。通常称与ground truth匹配的先验框为正样本(其实应该是先验框对应的预测box,不过由于是一一对应的就这样称呼了),反之,若一个先验框没有与任何ground truth进行匹配,那么该先验框只能与背景匹配,就是负样本。一个图片中ground truth是非常少的, 而先验框却很多,如果仅按第一个原则匹配,很多先验框会是负样本,正负样本极其不平衡,所以需要第二个原则。第二个原则是:对于剩余的未匹配先验框,若某个ground truth的 大于某个阈值(一般是0.5),那么该先验框也与这个ground truth进行匹配。这意味着某个ground truth可能与多个先验框匹配,这是可以的。但是反过来却不可以,因为一个先验框只能匹配一个ground truth,如果多个ground truth与某个先验框 大于阈值,那么先验框只与IOU最大的那个先验框进行匹配。第二个原则一定在第一个原则之后进行,仔细考虑一下这种情况,如果某个ground truth所对应最大 小于阈值,并且所匹配的先验框却与另外一个ground truth的 大于阈值,那么该先验框应该匹配谁,答案应该是前者,首先要确保某个ground truth一定有一个先验框与之匹配。但是,这种情况我觉得基本上是不存在的。由于先验框很多,某个ground truth的最大 肯定大于阈值,所以可能只实施第二个原则既可以了,这里的TensorFlow版本就是只实施了第二个原则,但是这里的Pytorch两个原则都实施了。图8为一个匹配示意图,其中绿色的GT是ground truth,红色为先验框,FP表示负样本,TP表示正样本。

                              图8 先验框匹配示意图

尽管一个ground truth可以与多个先验框匹配,但是ground truth相对先验框还是太少了,所以负样本相对正样本会很多。为了保证正负样本尽量平衡,SSD采用了hard negative mining,就是对负样本进行抽样,抽样时按照置信度误差(预测背景的置信度越小,误差越大)进行降序排列,选取误差的较大的top-k作为训练的负样本,以保证正负样本比例接近1:3。

(2)损失函数
训练样本确定了,然后就是损失函数了。损失函数定义为位置误差(locatization loss, loc)与置信度误差(confidence loss, conf)的加权和:

其中 是先验框的正样本数量。这里 为一个指示参数,当 时表示第 个先验框与第 个ground truth匹配,并且ground truth的类别为 。 为类别置信度预测值。 为先验框的所对应边界框的位置预测值,而 是ground truth的位置参数。对于位置误差,其采用Smooth L1 loss,定义如下:

由于 的存在,所以位置误差仅针对正样本进行计算。值得注意的是,要先对ground truth的 进行编码得到 ,因为预测值 也是编码值,若设置variance_encoded_in_target=True,编码时要加上variance:

对于置信度误差,其采用softmax loss:

权重系数 通过交叉验证设置为1。

预测过程

预测过程比较简单,对于每个预测框,首先根据类别置信度确定其类别(置信度最大者)与置信度值,并过滤掉属于背景的预测框。然后根据置信度阈值(如0.5)过滤掉阈值较低的预测框。对于留下的预测框进行解码,根据先验框得到其真实的位置参数(解码后一般还需要做clip,防止预测框位置超出图片)。解码之后,一般需要根据置信度进行降序排列,然后仅保留top-k(如400)个预测框。最后就是进行NMS算法,过滤掉那些重叠度较大的预测框。最后剩余的预测框就是检测结果了。

一. 源码解读;

其中ssd_vgg_300.py代码解析如下:

 

   
   
   
   
  1. # Copyright 2016 Paul Balanca. All Rights Reserved.
  2. #
  3. # Licensed under the Apache License, Version 2.0 (the "License");
  4. # you may not use this file except in compliance with the License.
  5. # You may obtain a copy of the License at
  6. #
  7. # http://www.apache.org/licenses/LICENSE-2.0
  8. #
  9. # Unless required by applicable law or agreed to in writing, software
  10. # distributed under the License is distributed on an "AS IS" BASIS,
  11. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  12. # See the License for the specific language governing permissions and
  13. # limitations under the License.
  14. # ==============================================================================
  15. """Definition of 300 VGG-based SSD network.
  16. This model was initially introduced in:
  17. SSD: Single Shot MultiBox Detector
  18. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed,
  19. Cheng-Yang Fu, Alexander C. Berg
  20. https://arxiv.org/abs/1512.02325
  21. Two variants of the model are defined: the 300x300 and 512x512 models, the
  22. latter obtaining a slightly better accuracy on Pascal VOC.
  23. Usage:
  24. with slim.arg_scope(ssd_vgg.ssd_vgg()):
  25. outputs, end_points = ssd_vgg.ssd_vgg(inputs)
  26. This network port of the original Caffe model. The padding in TF and Caffe
  27. is slightly different, and can lead to severe accuracy drop if not taken care
  28. in a correct way!
  29. In Caffe, the output size of convolution and pooling layers are computing as
  30. following: h_o = (h_i + 2 * pad_h - kernel_h) / stride_h + 1
  31. Nevertheless, there is a subtle difference between both for stride > 1. In
  32. the case of convolution:
  33. top_size = floor((bottom_size + 2*pad - kernel_size) / stride) + 1
  34. whereas for pooling:
  35. top_size = ceil((bottom_size + 2*pad - kernel_size) / stride) + 1
  36. Hence implicitely allowing some additional padding even if pad = 0. This
  37. behaviour explains why pooling with stride and kernel of size 2 are behaving
  38. the same way in TensorFlow and Caffe.
  39. Nevertheless, this is not the case anymore for other kernel sizes, hence
  40. motivating the use of special padding layer for controlling these side-effects.
  41. @@ssd_vgg_300
  42. """
  43. import math
  44. from collections import namedtuple
  45. import numpy as np
  46. import tensorflow as tf
  47. import tf_extended as tfe
  48. from nets import custom_layers
  49. from nets import ssd_common
  50. slim = tf.contrib.slim
  51. # =========================================================================== #
  52. # SSD class definition.
  53. # =========================================================================== #
  54. #collections模块的namedtuple子类不仅可以使用item的index访问item,还可以通过item的name进行访问可以将namedtuple理解为c中的struct结构,其首先将各个item命名,然后对每个item赋予数据
  55. SSDParams = namedtuple( 'SSDParameters', [ 'img_shape', #输入图像大小
  56. 'num_classes', #分类类别数
  57. 'no_annotation_label', #无标注标签
  58. 'feat_layers', #特征层
  59. 'feat_shapes', #特征层形状大小
  60. 'anchor_size_bounds', #锚点框大小上下边界,是与原图相比得到的小数值
  61. 'anchor_sizes', #初始锚点框尺寸
  62. 'anchor_ratios', #锚点框长宽比
  63. 'anchor_steps', #特征图相对原始图像的缩放
  64. 'anchor_offset', #锚点框中心的偏移
  65. 'normalizations', #是否正则化
  66. 'prior_scaling' #是对特征图参考框向gtbox做回归时用到的尺度缩放(0.1,0.1,0.2,0.2)
  67. ])
  68. class SSDNet(object):
  69. """Implementation of the SSD VGG-based 300 network.
  70. The default features layers with 300x300 image input are:
  71. conv4 ==> 38 x 38
  72. conv7 ==> 19 x 19
  73. conv8 ==> 10 x 10
  74. conv9 ==> 5 x 5
  75. conv10 ==> 3 x 3
  76. conv11 ==> 1 x 1
  77. The default image size used to train this network is 300x300. #训练输入图像尺寸默认为300x300
  78. """
  79. default_params = SSDParams( #默认参数
  80. img_shape=( 300, 300),
  81. num_classes= 21, #包含背景在内,共21类目标类别
  82. no_annotation_label= 21,
  83. feat_layers=[ 'block4', 'block7', 'block8', 'block9', 'block10', 'block11'], #特征层名字
  84. feat_shapes=[( 38, 38), ( 19, 19), ( 10, 10), ( 5, 5), ( 3, 3), ( 1, 1)], #特征层尺寸
  85. anchor_size_bounds=[ 0.15, 0.90],
  86. # anchor_size_bounds=[0.20, 0.90], #论文中初始预测框大小为0.2x300~0.9x300;实际代码是[45,270]
  87. anchor_sizes=[( 21., 45.), #直接给出的每个特征图上起初的锚点框大小;如第一个特征层框大小是h:21;w:45; 共6个特征图用于回归
  88. ( 45., 99.), #越小的框能够得到原图上更多的局部信息,反之得到更多的全局信息;
  89. ( 99., 153.),
  90. ( 153., 207.),
  91. ( 207., 261.),
  92. ( 261., 315.)],
  93. # anchor_sizes=[(30., 60.),
  94. # (60., 111.),
  95. # (111., 162.),
  96. # (162., 213.),
  97. # (213., 264.),
  98. # (264., 315.)],
  99. anchor_ratios=[[ 2, .5], #每个特征层上的每个特征点预测的box长宽比及数量;如:block4: def_boxes:4
  100. [ 2, .5, 3, 1./ 3], #block7: def_boxes:6 (ratios中的4个+默认的1:1+额外增加的一个=6)
  101. [ 2, .5, 3, 1./ 3], #block8: def_boxes:6
  102. [ 2, .5, 3, 1./ 3], #block9: def_boxes:6
  103. [ 2, .5], #block10: def_boxes:4
  104. [ 2, .5]], #block11: def_boxes:4 #备注:实际上略去了默认的ratio=1以及多加了一个sqrt(初始框宽*初始框高),后面代码有
  105. anchor_steps=[ 8, 16, 32, 64, 100, 300], #特征图锚点框放大到原始图的缩放比例;
  106. anchor_offset= 0.5, #每个锚点框中心点在该特征图cell中心,因此offset=0.5
  107. normalizations=[ 20, -1, -1, -1, -1, -1], #是否归一化,大于0则进行,否则不做归一化;目前看来只对block_4进行正则化,因为该层比较靠前,其norm较大,需做L2正则化(仅仅对每个像素在channel维度做归一化)以保证和后面检测层差异不是很大;
  108. prior_scaling=[ 0.1, 0.1, 0.2, 0.2] #特征图上每个目标与参考框间的尺寸缩放(y,x,h,w)解码时用到
  109. )
  110. def __init__(self, params=None): #网络参数的初始化
  111. """Init the SSD net with some parameters. Use the default ones
  112. if none provided.
  113. """
  114. if isinstance(params, SSDParams): #是否有参数输入,是则用输入的,否则使用默认的
  115. self.params = params #isinstance是python的內建函数,如果参数1与参数2的类型相同则返回true;
  116. else:
  117. self.params = SSDNet.default_params
  118. # ======================================================================= #
  119. def net(self, inputs, #定义网络模型
  120. is_training=True, #是否训练
  121. update_feat_shapes=True, #是否更新特征层的尺寸
  122. dropout_keep_prob=0.5, #dropout=0.5
  123. prediction_fn=slim.softmax, #采用softmax预测结果
  124. reuse=None,
  125. scope='ssd_300_vgg'): #网络名:ssd_300_vgg (基础网络时VGG,输入训练图像size是300x300)
  126. """SSD network definition.
  127. """
  128. r = ssd_net(inputs, #网络输入参数r
  129. num_classes=self.params.num_classes,
  130. feat_layers=self.params.feat_layers,
  131. anchor_sizes=self.params.anchor_sizes,
  132. anchor_ratios=self.params.anchor_ratios,
  133. normalizations=self.params.normalizations,
  134. is_training=is_training,
  135. dropout_keep_prob=dropout_keep_prob,
  136. prediction_fn=prediction_fn,
  137. reuse=reuse,
  138. scope=scope)
  139. # Update feature shapes (try at least!) #下面这步我的理解就是让读者自行更改特征层的输入,未必论文中介绍的那几个block
  140. if update_feat_shapes: #是否更新特征层图像尺寸?
  141. shapes = ssd_feat_shapes_from_net(r[ 0], self.params.feat_shapes) #输入特征层图像尺寸以及inputs(应该是预测的特征尺寸),输出更新后的特征图尺寸列表
  142. self.params = self.params._replace(feat_shapes=shapes) #将更新的特征图尺寸shapes替换当前的特征图尺寸
  143. return r #更新网络输入参数r
  144. def arg_scope(self, weight_decay=0.0005, data_format='NHWC'): #定义权重衰减=0.0005,L2正则化项系数;数据类型是NHWC
  145. """Network arg_scope.
  146. """
  147. return ssd_arg_scope(weight_decay, data_format=data_format)
  148. def arg_scope_caffe(self, caffe_scope):
  149. """Caffe arg_scope used for weights importing.
  150. """
  151. return ssd_arg_scope_caffe(caffe_scope)
  152. # ======================================================================= #
  153. def update_feature_shapes(self, predictions): #更新特征形状尺寸(来自预测结果)
  154. """Update feature shapes from predictions collection (Tensor or Numpy
  155. array).
  156. """
  157. shapes = ssd_feat_shapes_from_net(predictions, self.params.feat_shapes)
  158. self.params = self.params._replace(feat_shapes=shapes)
  159. def anchors(self, img_shape, dtype=np.float32): #输入原始图像尺寸;返回每个特征层每个参考锚点框的位置及尺寸信息(x,y,h,w)
  160. """Compute the default anchor boxes, given an image shape.
  161. """
  162. return ssd_anchors_all_layers(img_shape, #这是个关键函数;检测所有特征层中的参考锚点框位置和尺寸信息
  163. self.params.feat_shapes,
  164. self.params.anchor_sizes,
  165. self.params.anchor_ratios,
  166. self.params.anchor_steps,
  167. self.params.anchor_offset,
  168. dtype)
  169. def bboxes_encode(self, labels, bboxes, anchors, #编码,用于将标签信息,真实目标信息和锚点框信息编码在一起;得到预测真实框到参考框的转换值
  170. scope=None):
  171. """Encode labels and bounding boxes.
  172. """
  173. return ssd_common.tf_ssd_bboxes_encode(
  174. labels, bboxes, anchors,
  175. self.params.num_classes,
  176. self.params.no_annotation_label, #未标注的标签(应该代表背景)
  177. ignore_threshold= 0.5, #IOU筛选阈值
  178. prior_scaling=self.params.prior_scaling, #特征图目标与参考框间的尺寸缩放(0.1,0.1,0.2,0.2)
  179. scope=scope)
  180. def bboxes_decode(self, feat_localizations, anchors, #解码,用锚点框信息,锚点框与预测真实框间的转换值,得到真是的预测框(ymin,xmin,ymax,xmax)
  181. scope='ssd_bboxes_decode'):
  182. """Encode labels and bounding boxes.
  183. """
  184. return ssd_common.tf_ssd_bboxes_decode(
  185. feat_localizations, anchors,
  186. prior_scaling=self.params.prior_scaling,
  187. scope=scope)
  188. def detected_bboxes(self, predictions, localisations, #通过SSD网络,得到检测到的bbox
  189. select_threshold=None, nms_threshold=0.5,
  190. clipping_bbox=None, top_k=400, keep_top_k=200):
  191. """Get the detected bounding boxes from the SSD network output.
  192. """
  193. # Select top_k bboxes from predictions, and clip #选取top_k=400个框,并对框做修建(超出原图尺寸范围的切掉)
  194. rscores, rbboxes = \ #得到对应某个类别的得分值以及bbox
  195. ssd_common.tf_ssd_bboxes_select(predictions, localisations,
  196. select_threshold=select_threshold,
  197. num_classes=self.params.num_classes)
  198. rscores, rbboxes = \ #按照得分高低,筛选出400个bbox和对应得分
  199. tfe.bboxes_sort(rscores, rbboxes, top_k=top_k)
  200. # Apply NMS algorithm. #应用非极大值抑制,筛选掉与得分最高bbox重叠率大于0.5的,保留200个
  201. rscores, rbboxes = \
  202. tfe.bboxes_nms_batch(rscores, rbboxes,
  203. nms_threshold=nms_threshold,
  204. keep_top_k=keep_top_k)
  205. if clipping_bbox is not None:
  206. rbboxes = tfe.bboxes_clip(clipping_bbox, rbboxes)
  207. return rscores, rbboxes #返回裁剪好的bbox和对应得分
  208. #尽管一个ground truth可以与多个先验框匹配,但是ground truth相对先验框还是太少了,
  209. #所以负样本相对正样本会很多。为了保证正负样本尽量平衡,SSD采用了hard negative mining,
  210. #就是对负样本进行抽样,抽样时按照置信度误差(预测背景的置信度越小,误差越大)进行降序排列,
  211. #选取误差的较大的top-k作为训练的负样本,以保证正负样本比例接近1:3
  212. def losses(self, logits, localisations,
  213. gclasses, glocalisations, gscores,
  214. match_threshold=0.5,
  215. negative_ratio=3.,
  216. alpha=1.,
  217. label_smoothing=0.,
  218. scope='ssd_losses'):
  219. """Define the SSD network losses.
  220. """
  221. return ssd_losses(logits, localisations,
  222. gclasses, glocalisations, gscores,
  223. match_threshold=match_threshold,
  224. negative_ratio=negative_ratio,
  225. alpha=alpha,
  226. label_smoothing=label_smoothing,
  227. scope=scope)
  228. # =========================================================================== #
  229. # SSD tools...
  230. # =========================================================================== #
  231. def ssd_size_bounds_to_values(size_bounds,
  232. n_feat_layers,
  233. img_shape=(300, 300)):
  234. """Compute the reference sizes of the anchor boxes from relative bounds.
  235. The absolute values are measured in pixels, based on the network
  236. default size (300 pixels).
  237. This function follows the computation performed in the original
  238. implementation of SSD in Caffe.
  239. Return:
  240. list of list containing the absolute sizes at each scale. For each scale,
  241. the ratios only apply to the first value.
  242. """
  243. assert img_shape[ 0] == img_shape[ 1]
  244. img_size = img_shape[ 0]
  245. min_ratio = int(size_bounds[ 0] * 100)
  246. max_ratio = int(size_bounds[ 1] * 100)
  247. step = int(math.floor((max_ratio - min_ratio) / (n_feat_layers - 2)))
  248. # Start with the following smallest sizes.
  249. sizes = [[img_size * size_bounds[ 0] / 2, img_size * size_bounds[ 0]]]
  250. for ratio in range(min_ratio, max_ratio + 1, step):
  251. sizes.append((img_size * ratio / 100.,
  252. img_size * (ratio + step) / 100.))
  253. return sizes
  254. def ssd_feat_shapes_from_net(predictions, default_shapes=None):
  255. """Try to obtain the feature shapes from the prediction layers. The latter
  256. can be either a Tensor or Numpy ndarray.
  257. Return:
  258. list of feature shapes. Default values if predictions shape not fully
  259. determined.
  260. """
  261. feat_shapes = []
  262. for l in predictions: #l:是预测的特征形状
  263. # Get the shape, from either a np array or a tensor.
  264. if isinstance(l, np.ndarray): #如果l是np.ndarray类型,则将l的形状赋给shape;否则将shape作为list
  265. shape = l.shape
  266. else:
  267. shape = l.get_shape().as_list()
  268. shape = shape[ 1: 4]
  269. # Problem: undetermined shape... #如果预测的特征尺寸未定,则使用默认的形状;否则将shape中的值赋给特征形状列表中
  270. if None in shape:
  271. return default_shapes
  272. else:
  273. feat_shapes.append(shape)
  274. return feat_shapes #返回更新后的特征尺寸list
  275. def ssd_anchor_one_layer(img_shape, #检测单个特征图中所有锚点的坐标和尺寸信息(未与原图做除法)
  276. feat_shape,
  277. sizes,
  278. ratios,
  279. step,
  280. offset=0.5,
  281. dtype=np.float32):
  282. """Computer SSD default anchor boxes for one feature layer.
  283. Determine the relative position grid of the centers, and the relative
  284. width and height.
  285. Arguments:
  286. feat_shape: Feature shape, used for computing relative position grids;
  287. size: Absolute reference sizes;
  288. ratios: Ratios to use on these features;
  289. img_shape: Image shape, used for computing height, width relatively to the
  290. former;
  291. offset: Grid offset.
  292. Return:
  293. y, x, h, w: Relative x and y grids, and height and width.
  294. """
  295. # Compute the position grid: simple way.
  296. # y, x = np.mgrid[0:feat_shape[0], 0:feat_shape[1]]
  297. # y = (y.astype(dtype) + offset) / feat_shape[0]
  298. # x = (x.astype(dtype) + offset) / feat_shape[1]
  299. # Weird SSD-Caffe computation using steps values... #归一化到原图的锚点中心坐标(x,y);其坐标值域为(0,1)
  300. y, x = np.mgrid[ 0:feat_shape[ 0], 0:feat_shape[ 1]] #对于第一个特征图(block4:38x38);y=[[0,0,……0],[1,1,……1],……[37,37,……,37]];而x=[[0,1,2……,37],[0,1,2……,37],……[0,1,2……,37]]
  301. y = (y.astype(dtype) + offset) * step / img_shape[ 0] #将38个cell对应锚点框的y坐标偏移至每个cell中心,然后乘以相对原图缩放的比例,再除以原图
  302. x = (x.astype(dtype) + offset) * step / img_shape[ 1] #可以得到在原图上,相对原图比例大小的每个锚点中心坐标x,y
  303. # Expand dims to support easy broadcasting. #将锚点中心坐标扩大维度
  304. y = np.expand_dims(y, axis= -1) #对于第一个特征图,y的shape=38x38x1;x的shape=38x38x1
  305. x = np.expand_dims(x, axis= -1)
  306. # Compute relative height and width.
  307. # Tries to follow the original implementation of SSD for the order.
  308. num_anchors = len(sizes) + len(ratios) #该特征图上每个点对应的锚点框数量;如:对于第一个特征图每个点预测4个锚点框(block4:38x38),2+2=4
  309. h = np.zeros((num_anchors, ), dtype=dtype) #对于第一个特征图,h的shape=4x;w的shape=4x
  310. w = np.zeros((num_anchors, ), dtype=dtype)
  311. # Add first anchor boxes with ratio=1.
  312. h[ 0] = sizes[ 0] / img_shape[ 0] #第一个锚点框的高h[0]=起始锚点的高/原图大小的高;例如:h[0]=21/300
  313. w[ 0] = sizes[ 0] / img_shape[ 1] #第一个锚点框的宽w[0]=起始锚点的宽/原图大小的宽;例如:w[0]=21/300
  314. di = 1 #锚点宽个数偏移
  315. if len(sizes) > 1:
  316. h[ 1] = math.sqrt(sizes[ 0] * sizes[ 1]) / img_shape[ 0] #第二个锚点框的高h[1]=sqrt(起始锚点的高*起始锚点的宽)/原图大小的高;例如:h[1]=sqrt(21*45)/300
  317. w[ 1] = math.sqrt(sizes[ 0] * sizes[ 1]) / img_shape[ 1] #第二个锚点框的高w[1]=sqrt(起始锚点的高*起始锚点的宽)/原图大小的宽;例如:w[1]=sqrt(21*45)/300
  318. di += 1 #di=2
  319. for i, r in enumerate(ratios): #遍历长宽比例,第一个特征图,r只有两个,2和0.5;共四个锚点宽size(h[0]~h[3])
  320. h[i+di] = sizes[ 0] / img_shape[ 0] / math.sqrt(r) #例如:对于第一个特征图,h[0+2]=h[2]=21/300/sqrt(2);w[0+2]=w[2]=45/300*sqrt(2)
  321. w[i+di] = sizes[ 0] / img_shape[ 1] * math.sqrt(r) #例如:对于第一个特征图,h[1+2]=h[3]=21/300/sqrt(0.5);w[1+2]=w[3]=45/300*sqrt(0.5)
  322. return y, x, h, w #返回没有归一化前的锚点坐标和尺寸
  323. def ssd_anchors_all_layers(img_shape, #检测所有特征图中锚点框的四个坐标信息; 输入原始图大小
  324. layers_shape, #每个特征层形状尺寸
  325. anchor_sizes, #起始特征图中框的长宽size
  326. anchor_ratios, #锚点框长宽比列表
  327. anchor_steps, #锚点框相对原图缩放比例
  328. offset=0.5, #锚点中心在每个特征图cell中的偏移
  329. dtype=np.float32):
  330. """Compute anchor boxes for all feature layers.
  331. """
  332. layers_anchors = [] #用于存放所有特征图中锚点框位置尺寸信息
  333. for i, s in enumerate(layers_shape): #6个特征图尺寸;如:第0个是38x38
  334. anchor_bboxes = ssd_anchor_one_layer(img_shape, s, #分别计算每个特征图中锚点框的位置尺寸信息;
  335. anchor_sizes[i], #输入:第i个特征图中起始锚点框大小;如第0个是(21., 45.)
  336. anchor_ratios[i], #输入:第i个特征图中锚点框长宽比列表;如第0个是[2, .5]
  337. anchor_steps[i], #输入:第i个特征图中锚点框相对原始图的缩放比;如第0个是8
  338. offset=offset, dtype=dtype) #输入:锚点中心在每个特征图cell中的偏移
  339. layers_anchors.append(anchor_bboxes) #将6个特征图中每个特征图上的点对应的锚点框(6个或4个)保存
  340. return layers_anchors
  341. # =========================================================================== #
  342. # Functional definition of VGG-based SSD 300.
  343. # =========================================================================== #
  344. def tensor_shape(x, rank=3):
  345. """Returns the dimensions of a tensor.
  346. Args:
  347. image: A N-D Tensor of shape.
  348. Returns:
  349. A list of dimensions. Dimensions that are statically known are python
  350. integers,otherwise they are integer scalar tensors.
  351. """
  352. if x.get_shape().is_fully_defined():
  353. return x.get_shape().as_list()
  354. else:
  355. static_shape = x.get_shape().with_rank(rank).as_list()
  356. dynamic_shape = tf.unstack(tf.shape(x), rank)
  357. return [s if s is not None else d
  358. for s, d in zip(static_shape, dynamic_shape)]
  359. def ssd_multibox_layer(inputs, #输入特征层
  360. num_classes, #类别数
  361. sizes, #参考先验框的尺度
  362. ratios=[1], #默认的先验框长宽比为1
  363. normalization=-1, #默认不做正则化
  364. bn_normalization=False):
  365. """Construct a multibox layer, return a class and localization predictions.
  366. """
  367. net = inputs
  368. if normalization > 0: #如果输入整数,则进行L2正则化
  369. net = custom_layers.l2_normalization(net, scaling= True) #对通道所在维度进行正则化,随后乘以gamma缩放系数
  370. # Number of anchors.
  371. num_anchors = len(sizes) + len(ratios) #每层特征图参考先验框的个数[4,6,6,6,4,4]
  372. # Location. #每个先验框对应4个坐标信息
  373. num_loc_pred = num_anchors * 4 #特征图上每个单元预测的坐标所需维度=锚点框数*4
  374. loc_pred = slim.conv2d(net, num_loc_pred, [ 3, 3], activation_fn= None, #通过对特征图进行3x3卷积得到位置信息和类别权重信息
  375. scope= 'conv_loc') #该部分是定位信息,输出维度为[特征图h,特征图w,每个单元所有锚点框坐标]
  376. loc_pred = custom_layers.channel_to_last(loc_pred)
  377. loc_pred = tf.reshape(loc_pred, #最后整个特征图所有锚点框预测目标位置 tensor为[h*w*每个cell先验框数,4]
  378. tensor_shape(loc_pred, 4)[: -1]+[num_anchors, 4])
  379. # Class prediction. #类别预测
  380. num_cls_pred = num_anchors * num_classes #特征图上每个单元预测的类别所需维度=锚点框数*种类数
  381. cls_pred = slim.conv2d(net, num_cls_pred, [ 3, 3], activation_fn= None, #该部分是类别信息,输出维度为[特征图h,特征图w,每个单元所有锚点框对应类别信息]
  382. scope= 'conv_cls')
  383. cls_pred = custom_layers.channel_to_last(cls_pred)
  384. cls_pred = tf.reshape(cls_pred,
  385. tensor_shape(cls_pred, 4)[: -1]+[num_anchors, num_classes]) #最后整个特征图所有锚点框预测类别 tensor为[h*w*每个cell先验框数,种类数]
  386. return cls_pred, loc_pred #返回预测得到的类别和box位置 tensor
  387. def ssd_net(inputs, #定义ssd网络结构
  388. num_classes=SSDNet.default_params.num_classes, #分类数
  389. feat_layers=SSDNet.default_params.feat_layers, #特征层
  390. anchor_sizes=SSDNet.default_params.anchor_sizes,
  391. anchor_ratios=SSDNet.default_params.anchor_ratios,
  392. normalizations=SSDNet.default_params.normalizations, #正则化
  393. is_training=True,
  394. dropout_keep_prob=0.5,
  395. prediction_fn=slim.softmax,
  396. reuse=None,
  397. scope='ssd_300_vgg'):
  398. """SSD net definition.
  399. """
  400. # if data_format == 'NCHW':
  401. # inputs = tf.transpose(inputs, perm=(0, 3, 1, 2))
  402. # End_points collect relevant activations for external use.
  403. end_points = {} #用于收集每一层输出结果
  404. with tf.variable_scope(scope, 'ssd_300_vgg', [inputs], reuse=reuse):
  405. # Original VGG-16 blocks.
  406. net = slim.repeat(inputs, 2, slim.conv2d, 64, [ 3, 3], scope= 'conv1') #VGG16网络的第一个conv,重复2次卷积,核为3x3,64个特征
  407. end_points[ 'block1'] = net #conv1_2结果存入end_points,name='block1'
  408. net = slim.max_pool2d(net, [ 2, 2], scope= 'pool1')
  409. # Block 2.
  410. net = slim.repeat(net, 2, slim.conv2d, 128, [ 3, 3], scope= 'conv2') #重复2次卷积,核为3x3,128个特征
  411. end_points[ 'block2'] = net #conv2_2结果存入end_points,name='block2'
  412. net = slim.max_pool2d(net, [ 2, 2], scope= 'pool2')
  413. # Block 3.
  414. net = slim.repeat(net, 3, slim.conv2d, 256, [ 3, 3], scope= 'conv3') #重复3次卷积,核为3x3,256个特征
  415. end_points[ 'block3'] = net #conv3_3结果存入end_points,name='block3'
  416. net = slim.max_pool2d(net, [ 2, 2], scope= 'pool3')
  417. # Block 4.
  418. net = slim.repeat(net, 3, slim.conv2d, 512, [ 3, 3], scope= 'conv4') #重复3次卷积,核为3x3,512个特征
  419. end_points[ 'block4'] = net #conv4_3结果存入end_points,name='block4'
  420. net = slim.max_pool2d(net, [ 2, 2], scope= 'pool4')
  421. # Block 5.
  422. net = slim.repeat(net, 3, slim.conv2d, 512, [ 3, 3], scope= 'conv5') #重复3次卷积,核为3x3,512个特征
  423. end_points[ 'block5'] = net #conv5_3结果存入end_points,name='block5'
  424. net = slim.max_pool2d(net, [ 3, 3], stride= 1, scope= 'pool5')
  425. # Additional SSD blocks. #去掉了VGG的全连接层
  426. # Block 6: let's dilate the hell out of it!
  427. net = slim.conv2d(net, 1024, [ 3, 3], rate= 6, scope= 'conv6') #将VGG基础网络最后的池化层结果做扩展卷积(带孔卷积);
  428. end_points[ 'block6'] = net #conv6结果存入end_points,name='block6'
  429. net = tf.layers.dropout(net, rate=dropout_keep_prob, training=is_training) #dropout层
  430. # Block 7: 1x1 conv. Because the fuck.
  431. net = slim.conv2d(net, 1024, [ 1, 1], scope= 'conv7') #将dropout后的网络做1x1卷积,输出1024特征,name='block7'
  432. end_points[ 'block7'] = net
  433. net = tf.layers.dropout(net, rate=dropout_keep_prob, training=is_training) #将卷积后的网络继续做dropout
  434. # Block 8/9/10/11: 1x1 and 3x3 convolutions stride 2 (except lasts).
  435. end_point = 'block8'
  436. with tf.variable_scope(end_point):
  437. net = slim.conv2d(net, 256, [ 1, 1], scope= 'conv1x1') #对上述dropout的网络做1x1卷积,然后做3x3卷积,,输出512特征图,name=‘block8’
  438. net = custom_layers.pad2d(net, pad=( 1, 1))
  439. net = slim.conv2d(net, 512, [ 3, 3], stride= 2, scope= 'conv3x3', padding= 'VALID')
  440. end_points[end_point] = net
  441. end_point = 'block9'
  442. with tf.variable_scope(end_point):
  443. net = slim.conv2d(net, 128, [ 1, 1], scope= 'conv1x1') #对上述网络做1x1卷积,然后做3x3卷积,输出256特征图,name=‘block9’
  444. net = custom_layers.pad2d(net, pad=( 1, 1))
  445. net = slim.conv2d(net, 256, [ 3, 3], stride= 2, scope= 'conv3x3', padding= 'VALID')
  446. end_points[end_point] = net
  447. end_point = 'block10'
  448. with tf.variable_scope(end_point):
  449. net = slim.conv2d(net, 128, [ 1, 1], scope= 'conv1x1') #对上述网络做1x1卷积,然后做3x3卷积,输出256特征图,name=‘block10’
  450. net = slim.conv2d(net, 256, [ 3, 3], scope= 'conv3x3', padding= 'VALID')
  451. end_points[end_point] = net
  452. end_point = 'block11'
  453. with tf.variable_scope(end_point):
  454. net = slim.conv2d(net, 128, [ 1, 1], scope= 'conv1x1') #对上述网络做1x1卷积,然后做3x3卷积,输出256特征图,name=‘block11’
  455. net = slim.conv2d(net, 256, [ 3, 3], scope= 'conv3x3', padding= 'VALID')
  456. end_points[end_point] = net
  457. # Prediction and localisations layers. #预测和定位
  458. predictions = []
  459. logits = []
  460. localisations = []
  461. for i, layer in enumerate(feat_layers): #遍历特征层
  462. with tf.variable_scope(layer + '_box'): #起个命名范围
  463. p, l = ssd_multibox_layer(end_points[layer], #做多尺度大小box预测的特征层,返回每个cell中每个先验框预测的类别p和预测的位置l
  464. num_classes, #种类数
  465. anchor_sizes[i], #先验框尺度(同一特征图上的先验框尺度和长宽比一致)
  466. anchor_ratios[i], #先验框长宽比
  467. normalizations[i]) #每个特征正则化信息,目前是只对第一个特征图做归一化操作;
  468. #把每一层的预测收集
  469. predictions.append(prediction_fn(p)) #prediction_fn为softmax,预测类别
  470. logits.append(p) #把每个cell每个先验框预测的类别的概率值存在logits中
  471. localisations.append(l) #预测位置信息
  472. return predictions, localisations, logits, end_points #返回类别预测结果,位置预测结果,所属某个类别的概率值,以及特征层
  473. ssd_net.default_image_size = 300
  474. def ssd_arg_scope(weight_decay=0.0005, data_format='NHWC'): #权重衰减系数=0.0005;其是L2正则化项的系数
  475. """Defines the VGG arg scope.
  476. Args:
  477. weight_decay: The l2 regularization coefficient.
  478. Returns:
  479. An arg_scope.
  480. """
  481. with slim.arg_scope([slim.conv2d, slim.fully_connected],
  482. activation_fn=tf.nn.relu,
  483. weights_regularizer=slim.l2_regularizer(weight_decay),
  484. weights_initializer=tf.contrib.layers.xavier_initializer(),
  485. biases_initializer=tf.zeros_initializer()):
  486. with slim.arg_scope([slim.conv2d, slim.max_pool2d],
  487. padding= 'SAME',
  488. data_format=data_format):
  489. with slim.arg_scope([custom_layers.pad2d,
  490. custom_layers.l2_normalization,
  491. custom_layers.channel_to_last],
  492. data_format=data_format) as sc:
  493. return sc
  494. # =========================================================================== #
  495. # Caffe scope: importing weights at initialization.
  496. # =========================================================================== #
  497. def ssd_arg_scope_caffe(caffe_scope):
  498. """Caffe scope definition.
  499. Args:
  500. caffe_scope: Caffe scope object with loaded weights.
  501. Returns:
  502. An arg_scope.
  503. """
  504. # Default network arg scope.
  505. with slim.arg_scope([slim.conv2d],
  506. activation_fn=tf.nn.relu,
  507. weights_initializer=caffe_scope.conv_weights_init(),
  508. biases_initializer=caffe_scope.conv_biases_init()):
  509. with slim.arg_scope([slim.fully_connected],
  510. activation_fn=tf.nn.relu):
  511. with slim.arg_scope([custom_layers.l2_normalization],
  512. scale_initializer=caffe_scope.l2_norm_scale_init()):
  513. with slim.arg_scope([slim.conv2d, slim.max_pool2d],
  514. padding= 'SAME') as sc:
  515. return sc
  516. # =========================================================================== #
  517. # SSD loss function.
  518. # =========================================================================== #
  519. def ssd_losses(logits, localisations, #损失函数定义为位置误差和置信度误差的加权和;
  520. gclasses, glocalisations, gscores,
  521. match_threshold=0.5,
  522. negative_ratio=3.,
  523. alpha=1., #位置误差权重系数
  524. label_smoothing=0.,
  525. device='/cpu:0',
  526. scope=None):
  527. with tf.name_scope(scope, 'ssd_losses'):
  528. lshape = tfe.get_shape(logits[ 0], 5)
  529. num_classes = lshape[ -1]
  530. batch_size = lshape[ 0]
  531. # Flatten out all vectors!
  532. flogits = []
  533. fgclasses = []
  534. fgscores = []
  535. flocalisations = []
  536. fglocalisations = []
  537. for i in range(len(logits)):
  538. flogits.append(tf.reshape(logits[i], [ -1, num_classes])) #将类别的概率值reshape成(-1,21)
  539. fgclasses.append(tf.reshape(gclasses[i], [ -1])) #真实类别
  540. fgscores.append(tf.reshape(gscores[i], [ -1])) #预测真实目标的得分
  541. flocalisations.append(tf.reshape(localisations[i], [ -1, 4])) #预测真实目标边框坐标(编码形式的值)
  542. fglocalisations.append(tf.reshape(glocalisations[i], [ -1, 4])) #用于将真实目标gt的坐标进行编码存储
  543. # And concat the crap!
  544. logits = tf.concat(flogits, axis= 0)
  545. gclasses = tf.concat(fgclasses, axis= 0)
  546. gscores = tf.concat(fgscores, axis= 0)
  547. localisations = tf.concat(flocalisations, axis= 0)
  548. glocalisations = tf.concat(fglocalisations, axis= 0)
  549. dtype = logits.dtype
  550. # Compute positive matching mask...
  551. pmask = gscores > match_threshold #预测框与真实框IOU>0.5则将这个先验作为正样本
  552. fpmask = tf.cast(pmask, dtype)
  553. n_positives = tf.reduce_sum(fpmask) #求正样本数量N
  554. # Hard negative mining... 为了保证正负样本尽量平衡,SSD采用了hard negative mining,就是对负样本进行抽样,抽样时按照置信度误差(预测背景的置信度越小,误差越大)进行降序排列,选取误差的较大的top-k作为训练的负样本,以保证正负样本比例接近1:3
  555. no_classes = tf.cast(pmask, tf.int32)
  556. predictions = slim.softmax(logits) #类别预测
  557. nmask = tf.logical_and(tf.logical_not(pmask),
  558. gscores > -0.5)
  559. fnmask = tf.cast(nmask, dtype)
  560. nvalues = tf.where(nmask,
  561. predictions[:, 0],
  562. 1. - fnmask)
  563. nvalues_flat = tf.reshape(nvalues, [ -1])
  564. # Number of negative entries to select.
  565. max_neg_entries = tf.cast(tf.reduce_sum(fnmask), tf.int32)
  566. n_neg = tf.cast(negative_ratio * n_positives, tf.int32) + batch_size #负样本数量,保证是正样本3倍
  567. n_neg = tf.minimum(n_neg, max_neg_entries)
  568. val, idxes = tf.nn.top_k(-nvalues_flat, k=n_neg) #抽样时按照置信度误差(预测背景的置信度越小,误差越大)进行降序排列,选取误差的较大的top-k作为训练的负样本
  569. max_hard_pred = -val[ -1]
  570. # Final negative mask.
  571. nmask = tf.logical_and(nmask, nvalues < max_hard_pred)
  572. fnmask = tf.cast(nmask, dtype)
  573. # Add cross-entropy loss. #交叉熵
  574. with tf.name_scope( 'cross_entropy_pos'):
  575. loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, #类别置信度误差
  576. labels=gclasses)
  577. loss = tf.div(tf.reduce_sum(loss * fpmask), batch_size, name= 'value') #将置信度误差除以正样本数后除以batch-size
  578. tf.losses.add_loss(loss)
  579. with tf.name_scope( 'cross_entropy_neg'):
  580. loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,
  581. labels=no_classes)
  582. loss = tf.div(tf.reduce_sum(loss * fnmask), batch_size, name= 'value')
  583. tf.losses.add_loss(loss)
  584. # Add localization loss: smooth L1, L2, ...
  585. with tf.name_scope( 'localization'):
  586. # Weights Tensor: positive mask + random negative.
  587. weights = tf.expand_dims(alpha * fpmask, axis= -1)
  588. loss = custom_layers.abs_smooth(localisations - glocalisations) #先验框对应边界的位置预测值-真实位置;然后做Smooth L1 loss
  589. loss = tf.div(tf.reduce_sum(loss * weights), batch_size, name= 'value') #将上面的loss*权重(=alpha/正样本数)求和后除以batch-size
  590. tf.losses.add_loss(loss) #获得置信度误差和位置误差的加权和
  591. def ssd_losses_old(logits, localisations,
  592. gclasses, glocalisations, gscores,
  593. match_threshold=0.5,
  594. negative_ratio=3.,
  595. alpha=1.,
  596. label_smoothing=0.,
  597. device='/cpu:0',
  598. scope=None):
  599. """Loss functions for training the SSD 300 VGG network.
  600. This function defines the different loss components of the SSD, and
  601. adds them to the TF loss collection.
  602. Arguments:
  603. logits: (list of) predictions logits Tensors;
  604. localisations: (list of) localisations Tensors;
  605. gclasses: (list of) groundtruth labels Tensors;
  606. glocalisations: (list of) groundtruth localisations Tensors;
  607. gscores: (list of) groundtruth score Tensors;
  608. """
  609. with tf.device(device):
  610. with tf.name_scope(scope, 'ssd_losses'):
  611. l_cross_pos = []
  612. l_cross_neg = []
  613. l_loc = []
  614. for i in range(len(logits)):
  615. dtype = logits[i].dtype
  616. with tf.name_scope( 'block_%i' % i):
  617. # Sizing weight...
  618. wsize = tfe.get_shape(logits[i], rank= 5)
  619. wsize = wsize[ 1] * wsize[ 2] * wsize[ 3]
  620. # Positive mask.
  621. pmask = gscores[i] > match_threshold
  622. fpmask = tf.cast(pmask, dtype)
  623. n_positives = tf.reduce_sum(fpmask)
  624. # Select some random negative entries.
  625. # n_entries = np.prod(gclasses[i].get_shape().as_list())
  626. # r_positive = n_positives / n_entries
  627. # r_negative = negative_ratio * n_positives / (n_entries - n_positives)
  628. # Negative mask.
  629. no_classes = tf.cast(pmask, tf.int32)
  630. predictions = slim.softmax(logits[i])
  631. nmask = tf.logical_and(tf.logical_not(pmask),
  632. gscores[i] > -0.5)
  633. fnmask = tf.cast(nmask, dtype)
  634. nvalues = tf.where(nmask,
  635. predictions[:, :, :, :, 0],
  636. 1. - fnmask)
  637. nvalues_flat = tf.reshape(nvalues, [ -1])
  638. # Number of negative entries to select.
  639. n_neg = tf.cast(negative_ratio * n_positives, tf.int32)
  640. n_neg = tf.maximum(n_neg, tf.size(nvalues_flat) // 8)
  641. n_neg = tf.maximum(n_neg, tf.shape(nvalues)[ 0] * 4)
  642. max_neg_entries = 1 + tf.cast(tf.reduce_sum(fnmask), tf.int32)
  643. n_neg = tf.minimum(n_neg, max_neg_entries)
  644. val, idxes = tf.nn.top_k(-nvalues_flat, k=n_neg)
  645. max_hard_pred = -val[ -1]
  646. # Final negative mask.
  647. nmask = tf.logical_and(nmask, nvalues < max_hard_pred)
  648. fnmask = tf.cast(nmask, dtype)
  649. # Add cross-entropy loss.
  650. with tf.name_scope( 'cross_entropy_pos'):
  651. fpmask = wsize * fpmask
  652. loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits[i],
  653. labels=gclasses[i])
  654. loss = tf.losses.compute_weighted_loss(loss, fpmask)
  655. l_cross_pos.append(loss)
  656. with tf.name_scope( 'cross_entropy_neg'):
  657. fnmask = wsize * fnmask
  658. loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits[i],
  659. labels=no_classes)
  660. loss = tf.losses.compute_weighted_loss(loss, fnmask)
  661. l_cross_neg.append(loss)
  662. # Add localization loss: smooth L1, L2, ...
  663. with tf.name_scope( 'localization'):
  664. # Weights Tensor: positive mask + random negative.
  665. weights = tf.expand_dims(alpha * fpmask, axis= -1)
  666. loss = custom_layers.abs_smooth(localisations[i] - glocalisations[i])
  667. loss = tf.losses.compute_weighted_loss(loss, weights)
  668. l_loc.append(loss)
  669. # Additional total losses...
  670. with tf.name_scope( 'total'):
  671. total_cross_pos = tf.add_n(l_cross_pos, 'cross_entropy_pos')
  672. total_cross_neg = tf.add_n(l_cross_neg, 'cross_entropy_neg')
  673. total_cross = tf.add(total_cross_pos, total_cross_neg, 'cross_entropy')
  674. total_loc = tf.add_n(l_loc, 'localization')
  675. # Add to EXTRA LOSSES TF.collection
  676. tf.add_to_collection( 'EXTRA_LOSSES', total_cross_pos)
  677. tf.add_to_collection( 'EXTRA_LOSSES', total_cross_neg)
  678. tf.add_to_collection( 'EXTRA_LOSSES', total_cross)
  679. tf.add_to_collection( 'EXTRA_LOSSES', total_loc)

 

其中custom_layers.py的代码解析如下:


   
   
   
   
  1. # Copyright 2015 Paul Balanca. All Rights Reserved.
  2. #
  3. # Licensed under the Apache License, Version 2.0 (the "License");
  4. # you may not use this file except in compliance with the License.
  5. # You may obtain a copy of the License at
  6. #
  7. # http://www.apache.org/licenses/LICENSE-2.0
  8. #
  9. # Unless required by applicable law or agreed to in writing, software
  10. # distributed under the License is distributed on an "AS IS" BASIS,
  11. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  12. # See the License for the specific language governing permissions and
  13. # limitations under the License.
  14. # ==============================================================================
  15. """Implement some custom layers, not provided by TensorFlow.
  16. Trying to follow as much as possible the style/standards used in
  17. tf.contrib.layers
  18. """
  19. import tensorflow as tf
  20. from tensorflow.contrib.framework.python.ops import add_arg_scope
  21. from tensorflow.contrib.layers.python.layers import initializers
  22. from tensorflow.contrib.framework.python.ops import variables
  23. from tensorflow.contrib.layers.python.layers import utils
  24. from tensorflow.python.ops import nn
  25. from tensorflow.python.ops import init_ops
  26. from tensorflow.python.ops import variable_scope
  27. def abs_smooth(x):
  28. """Smoothed absolute function. Useful to compute an L1 smooth error. 当预测值与目标值相差很大时, 梯度容易爆炸,因此L1 loss对噪声(outliers)更鲁棒
  29. Define as:
  30. x^2 / 2 if abs(x) < 1
  31. abs(x) - 0.5 if abs(x) > 1
  32. We use here a differentiable definition using min(x) and abs(x). Clearly
  33. not optimal, but good enough for our purpose!
  34. """
  35. absx = tf.abs(x)
  36. minx = tf.minimum(absx, 1)
  37. r = 0.5 * ((absx - 1) * minx + absx) #计算得到L1 smooth loss
  38. return r
  39. @add_arg_scope
  40. def l2_normalization( #L2正则化:稀疏正则化操作
  41. inputs, #输入特征层,[batch_size,h,w,c]
  42. scaling=False, #默认归一化后是否设置缩放变量gamma
  43. scale_initializer=init_ops.ones_initializer(), #scale初始化为1
  44. reuse=None,
  45. variables_collections=None,
  46. outputs_collections=None,
  47. data_format='NHWC',
  48. trainable=True,
  49. scope=None):
  50. """Implement L2 normalization on every feature (i.e. spatial normalization).
  51. Should be extended in some near future to other dimensions, providing a more
  52. flexible normalization framework.
  53. Args:
  54. inputs: a 4-D tensor with dimensions [batch_size, height, width, channels].
  55. scaling: whether or not to add a post scaling operation along the dimensions
  56. which have been normalized.
  57. scale_initializer: An initializer for the weights.
  58. reuse: whether or not the layer and its variables should be reused. To be
  59. able to reuse the layer scope must be given.
  60. variables_collections: optional list of collections for all the variables or
  61. a dictionary containing a different list of collection per variable.
  62. outputs_collections: collection to add the outputs.
  63. data_format: NHWC or NCHW data format.
  64. trainable: If `True` also add variables to the graph collection
  65. `GraphKeys.TRAINABLE_VARIABLES` (see tf.Variable).
  66. scope: Optional scope for `variable_scope`.
  67. Returns:
  68. A `Tensor` representing the output of the operation.
  69. """
  70. with variable_scope.variable_scope(
  71. scope, 'L2Normalization', [inputs], reuse=reuse) as sc:
  72. inputs_shape = inputs.get_shape() #得到输入特征层的维度信息
  73. inputs_rank = inputs_shape.ndims #维度数=4
  74. dtype = inputs.dtype.base_dtype #数据类型
  75. if data_format == 'NHWC':
  76. # norm_dim = tf.range(1, inputs_rank-1)
  77. norm_dim = tf.range(inputs_rank -1, inputs_rank) #需要正则化的维度是4-1=3即channel这个维度
  78. params_shape = inputs_shape[ -1:] #通道数
  79. elif data_format == 'NCHW':
  80. # norm_dim = tf.range(2, inputs_rank)
  81. norm_dim = tf.range( 1, 2) #需要正则化的维度是第1维,即channel这个维度
  82. params_shape = (inputs_shape[ 1]) #通道数
  83. # Normalize along spatial dimensions.
  84. outputs = nn.l2_normalize(inputs, norm_dim, epsilon= 1e-12) #对通道所在维度进行正则化,其中epsilon是避免除0风险
  85. # Additional scaling.
  86. if scaling: #判断是否对正则化后设置缩放变量
  87. scale_collections = utils.get_variable_collections(
  88. variables_collections, 'scale')
  89. scale = variables.model_variable( 'gamma',
  90. shape=params_shape,
  91. dtype=dtype,
  92. initializer=scale_initializer,
  93. collections=scale_collections,
  94. trainable=trainable)
  95. if data_format == 'NHWC':
  96. outputs = tf.multiply(outputs, scale)
  97. elif data_format == 'NCHW':
  98. scale = tf.expand_dims(scale, axis= -1)
  99. scale = tf.expand_dims(scale, axis= -1)
  100. outputs = tf.multiply(outputs, scale)
  101. # outputs = tf.transpose(outputs, perm=(0, 2, 3, 1))
  102. return utils.collect_named_outputs(outputs_collections, #即返回L2_norm*gamma
  103. sc.original_name_

你可能感兴趣的:(目标探测)