duanyajun987

【深度学习SSD】——深刻解读SSD tensorflow及源码详解

本文主要针对SSD的tensorflow框架下的实现的源码解读即对网络模型的理解。

【前言】

首先在github上下载tensorflow版的SSD repository：https://github.com/balancap/SSD-Tensorflow

同时附上论文地址：SSD 论文下载

解压SSD-Tensorflow-master.zip 到自己工作目录下。

SSD直接采用卷积对不同的特征图来进行提取检测结果。对于形状为的特征图，只需要采用这样比较小的卷积核得到检测值；

SSD的检测值也与Yolo不太一样。对于每个单元的每个先验框，其都输出一套独立的检测值，对应一个边界框，主要分为两个部分。第一部分是各个类别的置信度或者评分，值得注意的是SSD将背景也当做了一个特殊的类别，如果检测目标共有个类别，SSD其实需要预测个置信度值，其中第一个置信度指的是不含目标或者属于背景的评分。后面当我们说个类别置信度时，请记住里面包含背景那个特殊的类别，即真实的检测类别只有个。在预测过程中，置信度最高的那个类别就是边界框所属的类别，特别地，当第一个置信度值最高时，表示边界框中并不包含目标。第二部分就是边界框的location，包含4个值，分别表示边界框的中心坐标以及宽高。但是真实预测值其实只是边界框相对于先验框的转换值(paper里面说是offset，但是觉得transformation更合适，参见R-CNN)。先验框位置用表示，其对应边界框用 $表示，那么边界框的预测值其实是相对于的转换值：

习惯上，我们称上面这个过程为边界框的编码（encode），预测时，你需要反向这个过程，即进行解码（decode），从预测值中得到边界框的真实位置：

然而，在SSD的Caffe源码实现中还有trick，那就是设置variance超参数来调整检测值，通过bool参数variance_encoded_in_target来控制两种模式，当其为True时，表示variance被包含在预测值中，就是上面那种情况。但是如果是False（大部分采用这种方式，训练更容易？），就需要手动设置超参数variance，用来对的4个值进行放缩，此时边界框需要这样解码：

综上所述，对于一个大小的特征图，共有个单元，每个单元设置的先验框数目记为，那么每个单元共需要个预测值，所有的单元共需要个预测值，由于SSD采用卷积做检测，所以就需要个卷积核完成这个特征图的检测过程。

VGG16中的Conv4_3层将作为用于检测的第一个特征图。conv4_3层特征图大小是，但是该层比较靠前，其norm较大，所以在其后面增加了一个L2 Normalization层（参见ParseNet），以保证和后面的检测层差异不是很大，这个和Batch Normalization层不太一样，其仅仅是对每个像素点在channle维度做归一化，而Batch Normalization层是在[batch_size, width, height]三个维度上做归一化。归一化后一般设置一个可训练的放缩变量gamma。

默认情况下，每个特征图会有一个且尺度为的先验框，除此之外，还会设置一个尺度为且的先验框，这样每个特征图都设置了两个长宽比为1但大小不同的正方形先验框。注意最后一个特征图需要参考一个虚拟来计算。因此，每个特征图一共有个先验框，但是在实现时，Conv4_3，Conv10_2和Conv11_2层仅使用4个先验框，它们不使用长宽比为的先验框。每个单元的先验框的中心点分布在各个单元的中心，即，其中为特征图的大小。

训练过程

（1）先验框匹配
在训练过程中，首先要确定训练图片中的ground truth（真实目标）与哪个先验框来进行匹配，与之匹配的先验框所对应的边界框将负责预测它。在Yolo中，ground truth的中心落在哪个单元格，该单元格中与其IOU最大的边界框负责预测它。但是在SSD中却完全不一样，SSD的先验框与ground truth的匹配原则主要有两点。首先，对于图片中每个ground truth，找到与其IOU最大的先验框，该先验框与其匹配，这样，可以保证每个ground truth一定与某个先验框匹配。通常称与ground truth匹配的先验框为正样本（其实应该是先验框对应的预测box，不过由于是一一对应的就这样称呼了），反之，若一个先验框没有与任何ground truth进行匹配，那么该先验框只能与背景匹配，就是负样本。一个图片中ground truth是非常少的，而先验框却很多，如果仅按第一个原则匹配，很多先验框会是负样本，正负样本极其不平衡，所以需要第二个原则。第二个原则是：对于剩余的未匹配先验框，若某个ground truth的大于某个阈值（一般是0.5），那么该先验框也与这个ground truth进行匹配。这意味着某个ground truth可能与多个先验框匹配，这是可以的。但是反过来却不可以，因为一个先验框只能匹配一个ground truth，如果多个ground truth与某个先验框大于阈值，那么先验框只与IOU最大的那个先验框进行匹配。第二个原则一定在第一个原则之后进行，仔细考虑一下这种情况，如果某个ground truth所对应最大小于阈值，并且所匹配的先验框却与另外一个ground truth的大于阈值，那么该先验框应该匹配谁，答案应该是前者，首先要确保某个ground truth一定有一个先验框与之匹配。但是，这种情况我觉得基本上是不存在的。由于先验框很多，某个ground truth的最大肯定大于阈值，所以可能只实施第二个原则既可以了，这里的TensorFlow版本就是只实施了第二个原则，但是这里的Pytorch两个原则都实施了。图8为一个匹配示意图，其中绿色的GT是ground truth，红色为先验框，FP表示负样本，TP表示正样本。

图8 先验框匹配示意图

尽管一个ground truth可以与多个先验框匹配，但是ground truth相对先验框还是太少了，所以负样本相对正样本会很多。为了保证正负样本尽量平衡，SSD采用了hard negative mining，就是对负样本进行抽样，抽样时按照置信度误差（预测背景的置信度越小，误差越大）进行降序排列，选取误差的较大的top-k作为训练的负样本，以保证正负样本比例接近1:3。

（2）损失函数
训练样本确定了，然后就是损失函数了。损失函数定义为位置误差（locatization loss， loc）与置信度误差（confidence loss, conf）的加权和：

其中是先验框的正样本数量。这里为一个指示参数，当时表示第个先验框与第个ground truth匹配，并且ground truth的类别为。为类别置信度预测值。为先验框的所对应边界框的位置预测值，而是ground truth的位置参数。对于位置误差，其采用Smooth L1 loss，定义如下：

由于的存在，所以位置误差仅针对正样本进行计算。值得注意的是，要先对ground truth的进行编码得到，因为预测值也是编码值，若设置variance_encoded_in_target=True，编码时要加上variance：

对于置信度误差，其采用softmax loss:

权重系数通过交叉验证设置为1。

预测过程

预测过程比较简单，对于每个预测框，首先根据类别置信度确定其类别（置信度最大者）与置信度值，并过滤掉属于背景的预测框。然后根据置信度阈值（如0.5）过滤掉阈值较低的预测框。对于留下的预测框进行解码，根据先验框得到其真实的位置参数（解码后一般还需要做clip，防止预测框位置超出图片）。解码之后，一般需要根据置信度进行降序排列，然后仅保留top-k（如400）个预测框。最后就是进行NMS算法，过滤掉那些重叠度较大的预测框。最后剩余的预测框就是检测结果了。

一. 源码解读；

其中ssd_vgg_300.py代码解析如下：

[python] view plain copy

#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Definition of 300 VGG-based SSD network.
This model was initially introduced in:
SSD: Single Shot MultiBox Detector
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed,
Cheng-Yang Fu, Alexander C. Berg
https://arxiv.org/abs/1512.02325
Two variants of the model are defined: the 300x300 and 512x512 models, the
latter obtaining a slightly better accuracy on Pascal VOC.
Usage:
with slim.arg_scope(ssd_vgg.ssd_vgg()):
outputs, end_points = ssd_vgg.ssd_vgg(inputs)
This network port of the original Caffe model. The padding in TF and Caffe
is slightly different, and can lead to severe accuracy drop if not taken care
in a correct way!
In Caffe, the output size of convolution and pooling layers are computing as
following: h_o = (h_i + 2 * pad_h - kernel_h) / stride_h + 1
Nevertheless, there is a subtle difference between both for stride > 1. In
the case of convolution:
top_size = floor((bottom_size + 2*pad - kernel_size) / stride) + 1
whereas for pooling:
top_size = ceil((bottom_size + 2*pad - kernel_size) / stride) + 1
Hence implicitely allowing some additional padding even if pad = 0. This
behaviour explains why pooling with stride and kernel of size 2 are behaving
the same way in TensorFlow and Caffe.
Nevertheless, this is not the case anymore for other kernel sizes, hence
motivating the use of special padding layer for controlling these side-effects.
@@ssd_vgg_300
"""
import math
from collections import namedtuple
import numpy as np
import tensorflow as tf
import tf_extended as tfe
from nets import custom_layers
from nets import ssd_common
slim = tf.contrib.slim
# =========================================================================== #
# SSD class definition.
# =========================================================================== #
#collections模块的namedtuple子类不仅可以使用item的index访问item，还可以通过item的name进行访问可以将namedtuple理解为c中的struct结构，其首先将各个item命名，然后对每个item赋予数据
SSDParams = namedtuple('SSDParameters', ['img_shape', #输入图像大小
'num_classes', #分类类别数
'no_annotation_label', #无标注标签
'feat_layers', #特征层
'feat_shapes', #特征层形状大小
'anchor_size_bounds', #锚点框大小上下边界，是与原图相比得到的小数值
'anchor_sizes', #初始锚点框尺寸
'anchor_ratios', #锚点框长宽比
'anchor_steps', #特征图相对原始图像的缩放
'anchor_offset', #锚点框中心的偏移
'normalizations', #是否正则化
'prior_scaling' #是对特征图参考框向gtbox做回归时用到的尺度缩放（0.1,0.1,0.2,0.2）
])
class SSDNet(object):
"""Implementation of the SSD VGG-based 300 network.
The default features layers with 300x300 image input are:
conv4 ==> 38 x 38
conv7 ==> 19 x 19
conv8 ==> 10 x 10
conv9 ==> 5 x 5
conv10 ==> 3 x 3
conv11 ==> 1 x 1
The default image size used to train this network is 300x300. #训练输入图像尺寸默认为300x300
"""
default_params = SSDParams( #默认参数
img_shape=(300, 300),
num_classes=21, #包含背景在内，共21类目标类别
no_annotation_label=21,
feat_layers=['block4', 'block7', 'block8', 'block9', 'block10', 'block11'], #特征层名字
feat_shapes=[(38, 38), (19, 19), (10, 10), (5, 5), (3, 3), (1, 1)], #特征层尺寸
anchor_size_bounds=[0.15, 0.90],
# anchor_size_bounds=[0.20, 0.90], #论文中初始预测框大小为0.2x300~0.9x300；实际代码是[45,270]
anchor_sizes=[(21., 45.), #直接给出的每个特征图上起初的锚点框大小；如第一个特征层框大小是h:21;w:45; 共6个特征图用于回归
(45., 99.), #越小的框能够得到原图上更多的局部信息，反之得到更多的全局信息；
(99., 153.),
(153., 207.),
(207., 261.),
(261., 315.)],
# anchor_sizes=[(30., 60.),
# (60., 111.),
# (111., 162.),
# (162., 213.),
# (213., 264.),
# (264., 315.)],
anchor_ratios=[[2, .5], #每个特征层上的每个特征点预测的box长宽比及数量；如：block4: def_boxes:4
[2, .5, 3, 1./3], #block7: def_boxes:6 （ratios中的4个+默认的1:1+额外增加的一个=6）
[2, .5, 3, 1./3], #block8: def_boxes:6
[2, .5, 3, 1./3], #block9: def_boxes:6
[2, .5], #block10: def_boxes:4
[2, .5]], #block11: def_boxes:4 #备注：实际上略去了默认的ratio=1以及多加了一个sqrt(初始框宽*初始框高)，后面代码有
anchor_steps=[8, 16, 32, 64, 100, 300], #特征图锚点框放大到原始图的缩放比例；
anchor_offset=0.5, #每个锚点框中心点在该特征图cell中心，因此offset=0.5
normalizations=[20, -1, -1, -1, -1, -1], #是否归一化，大于0则进行，否则不做归一化；目前看来只对block_4进行正则化，因为该层比较靠前，其norm较大，需做L2正则化（仅仅对每个像素在channel维度做归一化）以保证和后面检测层差异不是很大；
prior_scaling=[0.1, 0.1, 0.2, 0.2] #特征图上每个目标与参考框间的尺寸缩放（y,x,h,w）解码时用到
)
def __init__(self, params=None): #网络参数的初始化
"""Init the SSD net with some parameters. Use the default ones
if none provided.
"""
if isinstance(params, SSDParams): #是否有参数输入，是则用输入的，否则使用默认的
self.params = params #isinstance是python的內建函数，如果参数1与参数2的类型相同则返回true；
else:
self.params = SSDNet.default_params
# ======================================================================= #
def net(self, inputs, #定义网络模型
is_training=True, #是否训练
update_feat_shapes=True, #是否更新特征层的尺寸
dropout_keep_prob=0.5, #dropout=0.5
prediction_fn=slim.softmax, #采用softmax预测结果
reuse=None,
scope='ssd_300_vgg'): #网络名：ssd_300_vgg （基础网络时VGG，输入训练图像size是300x300）
"""SSD network definition.
"""
r = ssd_net(inputs, #网络输入参数r
num_classes=self.params.num_classes,
feat_layers=self.params.feat_layers,
anchor_sizes=self.params.anchor_sizes,
anchor_ratios=self.params.anchor_ratios,
normalizations=self.params.normalizations,
is_training=is_training,
dropout_keep_prob=dropout_keep_prob,
prediction_fn=prediction_fn,
reuse=reuse,
scope=scope)
# Update feature shapes (try at least!) #下面这步我的理解就是让读者自行更改特征层的输入，未必论文中介绍的那几个block
if update_feat_shapes: #是否更新特征层图像尺寸？
shapes = ssd_feat_shapes_from_net(r[0], self.params.feat_shapes) #输入特征层图像尺寸以及inputs（应该是预测的特征尺寸），输出更新后的特征图尺寸列表
self.params = self.params._replace(feat_shapes=shapes) #将更新的特征图尺寸shapes替换当前的特征图尺寸
return r #更新网络输入参数r
def arg_scope(self, weight_decay=0.0005, data_format='NHWC'): #定义权重衰减=0.0005，L2正则化项系数；数据类型是NHWC
"""Network arg_scope.
"""
return ssd_arg_scope(weight_decay, data_format=data_format)
def arg_scope_caffe(self, caffe_scope):
"""Caffe arg_scope used for weights importing.
"""
return ssd_arg_scope_caffe(caffe_scope)
# ======================================================================= #
def update_feature_shapes(self, predictions): #更新特征形状尺寸（来自预测结果）
"""Update feature shapes from predictions collection (Tensor or Numpy
array).
"""
shapes = ssd_feat_shapes_from_net(predictions, self.params.feat_shapes)
self.params = self.params._replace(feat_shapes=shapes)
def anchors(self, img_shape, dtype=np.float32): #输入原始图像尺寸；返回每个特征层每个参考锚点框的位置及尺寸信息（x,y,h,w）
"""Compute the default anchor boxes, given an image shape.
"""
return ssd_anchors_all_layers(img_shape, #这是个关键函数；检测所有特征层中的参考锚点框位置和尺寸信息
self.params.feat_shapes,
self.params.anchor_sizes,
self.params.anchor_ratios,
self.params.anchor_steps,
self.params.anchor_offset,
dtype)
def bboxes_encode(self, labels, bboxes, anchors, #编码，用于将标签信息，真实目标信息和锚点框信息编码在一起；得到预测真实框到参考框的转换值
scope=None):
"""Encode labels and bounding boxes.
"""
return ssd_common.tf_ssd_bboxes_encode(
labels, bboxes, anchors,
self.params.num_classes,
self.params.no_annotation_label, #未标注的标签（应该代表背景）
ignore_threshold=0.5, #IOU筛选阈值
prior_scaling=self.params.prior_scaling, #特征图目标与参考框间的尺寸缩放（0.1,0.1,0.2,0.2）
scope=scope)
def bboxes_decode(self, feat_localizations, anchors, #解码，用锚点框信息，锚点框与预测真实框间的转换值，得到真是的预测框（ymin,xmin,ymax,xmax）
scope='ssd_bboxes_decode'):
"""Encode labels and bounding boxes.
"""
return ssd_common.tf_ssd_bboxes_decode(
feat_localizations, anchors,
prior_scaling=self.params.prior_scaling,
scope=scope)
def detected_bboxes(self, predictions, localisations, #通过SSD网络，得到检测到的bbox
select_threshold=None, nms_threshold=0.5,
clipping_bbox=None, top_k=400, keep_top_k=200):
"""Get the detected bounding boxes from the SSD network output.
"""
# Select top_k bboxes from predictions, and clip #选取top_k=400个框，并对框做修建（超出原图尺寸范围的切掉）
rscores, rbboxes = \ #得到对应某个类别的得分值以及bbox
ssd_common.tf_ssd_bboxes_select(predictions, localisations,
select_threshold=select_threshold,
num_classes=self.params.num_classes)
rscores, rbboxes = \ #按照得分高低，筛选出400个bbox和对应得分
tfe.bboxes_sort(rscores, rbboxes, top_k=top_k)
# Apply NMS algorithm. #应用非极大值抑制，筛选掉与得分最高bbox重叠率大于0.5的，保留200个
rscores, rbboxes = \
tfe.bboxes_nms_batch(rscores, rbboxes,
nms_threshold=nms_threshold,
keep_top_k=keep_top_k)
if clipping_bbox is not None:
rbboxes = tfe.bboxes_clip(clipping_bbox, rbboxes)
return rscores, rbboxes #返回裁剪好的bbox和对应得分
#尽管一个ground truth可以与多个先验框匹配，但是ground truth相对先验框还是太少了，
#所以负样本相对正样本会很多。为了保证正负样本尽量平衡，SSD采用了hard negative mining，
#就是对负样本进行抽样，抽样时按照置信度误差（预测背景的置信度越小，误差越大）进行降序排列，
#选取误差的较大的top-k作为训练的负样本，以保证正负样本比例接近1:3
def losses(self, logits, localisations,
gclasses, glocalisations, gscores,
match_threshold=0.5,
negative_ratio=3.,
alpha=1.,
label_smoothing=0.,
scope='ssd_losses'):
"""Define the SSD network losses.
"""
return ssd_losses(logits, localisations,
gclasses, glocalisations, gscores,
match_threshold=match_threshold,
negative_ratio=negative_ratio,
alpha=alpha,
label_smoothing=label_smoothing,
scope=scope)
# =========================================================================== #
# SSD tools...
# =========================================================================== #
def ssd_size_bounds_to_values(size_bounds,
n_feat_layers,
img_shape=(300, 300)):
"""Compute the reference sizes of the anchor boxes from relative bounds.
The absolute values are measured in pixels, based on the network
default size (300 pixels).
This function follows the computation performed in the original
implementation of SSD in Caffe.
Return:
list of list containing the absolute sizes at each scale. For each scale,
the ratios only apply to the first value.
"""
assert img_shape[0] == img_shape[1]
img_size = img_shape[0]
min_ratio = int(size_bounds[0] * 100)
max_ratio = int(size_bounds[1] * 100)
step = int(math.floor((max_ratio - min_ratio) / (n_feat_layers - 2)))
# Start with the following smallest sizes.
sizes = [[img_size * size_bounds[0] / 2, img_size * size_bounds[0]]]
for ratio in range(min_ratio, max_ratio + 1, step):
sizes.append((img_size * ratio / 100.,
img_size * (ratio + step) / 100.))
return sizes
def ssd_feat_shapes_from_net(predictions, default_shapes=None):
"""Try to obtain the feature shapes from the prediction layers. The latter
can be either a Tensor or Numpy ndarray.
Return:
list of feature shapes. Default values if predictions shape not fully
determined.
"""
feat_shapes = []
for l in predictions: #l:是预测的特征形状
# Get the shape, from either a np array or a tensor.
if isinstance(l, np.ndarray): #如果l是np.ndarray类型，则将l的形状赋给shape；否则将shape作为list
shape = l.shape
else:
shape = l.get_shape().as_list()
shape = shape[1:4]
# Problem: undetermined shape... #如果预测的特征尺寸未定，则使用默认的形状；否则将shape中的值赋给特征形状列表中
if None in shape:
return default_shapes
else:
feat_shapes.append(shape)
return feat_shapes #返回更新后的特征尺寸list
def ssd_anchor_one_layer(img_shape, #检测单个特征图中所有锚点的坐标和尺寸信息（未与原图做除法）
feat_shape,
sizes,
ratios,
step,
offset=0.5,
dtype=np.float32):
"""Computer SSD default anchor boxes for one feature layer.
Determine the relative position grid of the centers, and the relative
width and height.
Arguments:
feat_shape: Feature shape, used for computing relative position grids;
size: Absolute reference sizes;
ratios: Ratios to use on these features;
img_shape: Image shape, used for computing height, width relatively to the
former;
offset: Grid offset.
Return:
y, x, h, w: Relative x and y grids, and height and width.
"""
# Compute the position grid: simple way.
# y, x = np.mgrid[0:feat_shape[0], 0:feat_shape[1]]
# y = (y.astype(dtype) + offset) / feat_shape[0]
# x = (x.astype(dtype) + offset) / feat_shape[1]
# Weird SSD-Caffe computation using steps values... #归一化到原图的锚点中心坐标（x,y）;其坐标值域为(0,1)
y, x = np.mgrid[0:feat_shape[0], 0:feat_shape[1]] #对于第一个特征图（block4：38x38）；y=[[0,0,……0],[1,1,……1]，……[37,37,……，37]]；而x=[[0,1,2……，37]，[0,1,2……，37],……[0,1,2……，37]]
y = (y.astype(dtype) + offset) * step / img_shape[0] #将38个cell对应锚点框的y坐标偏移至每个cell中心，然后乘以相对原图缩放的比例，再除以原图
x = (x.astype(dtype) + offset) * step / img_shape[1] #可以得到在原图上，相对原图比例大小的每个锚点中心坐标x,y
# Expand dims to support easy broadcasting. #将锚点中心坐标扩大维度
y = np.expand_dims(y, axis=-1) #对于第一个特征图，y的shape=38x38x1；x的shape=38x38x1
x = np.expand_dims(x, axis=-1)
# Compute relative height and width.
# Tries to follow the original implementation of SSD for the order.
num_anchors = len(sizes) + len(ratios) #该特征图上每个点对应的锚点框数量;如：对于第一个特征图每个点预测4个锚点框（block4：38x38），2+2=4
h = np.zeros((num_anchors, ), dtype=dtype) #对于第一个特征图，h的shape=4x；w的shape=4x
w = np.zeros((num_anchors, ), dtype=dtype)
# Add first anchor boxes with ratio=1.
h[0] = sizes[0] / img_shape[0] #第一个锚点框的高h[0]=起始锚点的高/原图大小的高；例如：h[0]=21/300
w[0] = sizes[0] / img_shape[1] #第一个锚点框的宽w[0]=起始锚点的宽/原图大小的宽；例如：h[0]=45/300
di = 1 #锚点宽个数偏移
if len(sizes) > 1:
h[1] = math.sqrt(sizes[0] * sizes[1]) / img_shape[0] #第二个锚点框的高h[1]=sqrt（起始锚点的高*起始锚点的宽）/原图大小的高；例如：h[1]=sqrt(21*45)/300
w[1] = math.sqrt(sizes[0] * sizes[1]) / img_shape[1] #第二个锚点框的高w[1]=sqrt（起始锚点的高*起始锚点的宽）/原图大小的宽；例如：w[1]=sqrt(21*45)/300
di += 1 #di=2
for i, r in enumerate(ratios): #遍历长宽比例，第一个特征图，r只有两个，2和0.5；共四个锚点宽size（h[0]~h[3]）
h[i+di] = sizes[0] / img_shape[0] / math.sqrt(r) #例如：对于第一个特征图，h[0+2]=h[2]=21/300/sqrt(2);w[0+2]=w[2]=45/300*sqrt(2)
w[i+di] = sizes[0] / img_shape[1] * math.sqrt(r) #例如：对于第一个特征图，h[1+2]=h[3]=21/300/sqrt(0.5);w[1+2]=w[3]=45/300*sqrt(0.5)
return y, x, h, w #返回没有归一化前的锚点坐标和尺寸
def ssd_anchors_all_layers(img_shape, #检测所有特征图中锚点框的四个坐标信息；输入原始图大小
layers_shape, #每个特征层形状尺寸
anchor_sizes, #起始特征图中框的长宽size
anchor_ratios, #锚点框长宽比列表
anchor_steps, #锚点框相对原图缩放比例
offset=0.5, #锚点中心在每个特征图cell中的偏移
dtype=np.float32):
"""Compute anchor boxes for all feature layers.
"""
layers_anchors = [] #用于存放所有特征图中锚点框位置尺寸信息
for i, s in enumerate(layers_shape): #6个特征图尺寸；如：第0个是38x38
anchor_bboxes = ssd_anchor_one_layer(img_shape, s, #分别计算每个特征图中锚点框的位置尺寸信息；
anchor_sizes[i], #输入：第i个特征图中起始锚点框大小；如第0个是(21., 45.)
anchor_ratios[i], #输入：第i个特征图中锚点框长宽比列表；如第0个是[2, .5]
anchor_steps[i], #输入：第i个特征图中锚点框相对原始图的缩放比；如第0个是8
offset=offset, dtype=dtype) #输入：锚点中心在每个特征图cell中的偏移
layers_anchors.append(anchor_bboxes) #将6个特征图中每个特征图上的点对应的锚点框（6个或4个）保存
return layers_anchors
# =========================================================================== #
# Functional definition of VGG-based SSD 300.
# =========================================================================== #
def tensor_shape(x, rank=3):
"""Returns the dimensions of a tensor.
Args:
image: A N-D Tensor of shape.
Returns:
A list of dimensions. Dimensions that are statically known are python
integers,otherwise they are integer scalar tensors.
"""
if x.get_shape().is_fully_defined():
return x.get_shape().as_list()
else:
static_shape = x.get_shape().with_rank(rank).as_list()
dynamic_shape = tf.unstack(tf.shape(x), rank)
return [s if s is not None else d
for s, d in zip(static_shape, dynamic_shape)]
def ssd_multibox_layer(inputs, #输入特征层
num_classes, #类别数
sizes, #参考先验框的尺度
ratios=[1], #默认的先验框长宽比为1
normalization=-1, #默认不做正则化
bn_normalization=False):
"""Construct a multibox layer, return a class and localization predictions.
"""
net = inputs
if normalization > 0: #如果输入整数，则进行L2正则化
net = custom_layers.l2_normalization(net, scaling=True) #对通道所在维度进行正则化，随后乘以gamma缩放系数
# Number of anchors.
num_anchors = len(sizes) + len(ratios) #每层特征图参考先验框的个数[4,6,6,6,4,4]
# Location. #每个先验框对应4个坐标信息
num_loc_pred = num_anchors * 4 #特征图上每个单元预测的坐标所需维度=锚点框数*4
loc_pred = slim.conv2d(net, num_loc_pred, [3, 3], activation_fn=None, #通过对特征图进行3x3卷积得到位置信息和类别权重信息
scope='conv_loc') #该部分是定位信息，输出维度为[特征图h,特征图w,每个单元所有锚点框坐标]
loc_pred = custom_layers.channel_to_last(loc_pred)
loc_pred = tf.reshape(loc_pred, #最后整个特征图所有锚点框预测目标位置 tensor为[h*w*每个cell先验框数，4]
tensor_shape(loc_pred, 4)[:-1]+[num_anchors, 4])
# Class prediction. #类别预测
num_cls_pred = num_anchors * num_classes #特征图上每个单元预测的类别所需维度=锚点框数*种类数
cls_pred = slim.conv2d(net, num_cls_pred, [3, 3], activation_fn=None, #该部分是类别信息，输出维度为[特征图h,特征图w,每个单元所有锚点框对应类别信息]
scope='conv_cls')
cls_pred = custom_layers.channel_to_last(cls_pred)
cls_pred = tf.reshape(cls_pred,
tensor_shape(cls_pred, 4)[:-1]+[num_anchors, num_classes]) #最后整个特征图所有锚点框预测类别 tensor为[h*w*每个cell先验框数，种类数]
return cls_pred, loc_pred #返回预测得到的类别和box位置 tensor
def ssd_net(inputs, #定义ssd网络结构
num_classes=SSDNet.default_params.num_classes, #分类数
feat_layers=SSDNet.default_params.feat_layers, #特征层
anchor_sizes=SSDNet.default_params.anchor_sizes,
anchor_ratios=SSDNet.default_params.anchor_ratios,
normalizations=SSDNet.default_params.normalizations, #正则化
is_training=True,
dropout_keep_prob=0.5,
prediction_fn=slim.softmax,
reuse=None,
scope='ssd_300_vgg'):
"""SSD net definition.
"""
# if data_format == 'NCHW':
# inputs = tf.transpose(inputs, perm=(0, 3, 1, 2))
# End_points collect relevant activations for external use.
end_points = {} #用于收集每一层输出结果
with tf.variable_scope(scope, 'ssd_300_vgg', [inputs], reuse=reuse):
# Original VGG-16 blocks.
net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope='conv1') #VGG16网络的第一个conv，重复2次卷积，核为3x3,64个特征
end_points['block1'] = net #conv1_2结果存入end_points，name='block1'
net = slim.max_pool2d(net, [2, 2], scope='pool1')
# Block 2.
net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv2') #重复2次卷积，核为3x3,128个特征
end_points['block2'] = net #conv2_2结果存入end_points，name='block2'
net = slim.max_pool2d(net, [2, 2], scope='pool2')
# Block 3.
net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv3') #重复3次卷积，核为3x3,256个特征
end_points['block3'] = net #conv3_3结果存入end_points，name='block3'
net = slim.max_pool2d(net, [2, 2], scope='pool3')
# Block 4.
net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv4') #重复3次卷积，核为3x3,512个特征
end_points['block4'] = net #conv4_3结果存入end_points，name='block4'
net = slim.max_pool2d(net, [2, 2], scope='pool4')
# Block 5.
net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv5') #重复3次卷积，核为3x3,512个特征
end_points['block5'] = net #conv5_3结果存入end_points，name='block5'
net = slim.max_pool2d(net, [3, 3], stride=1, scope='pool5')
# Additional SSD blocks. #去掉了VGG的全连接层
# Block 6: let's dilate the hell out of it!
net = slim.conv2d(net, 1024, [3, 3], rate=6, scope='conv6') #将VGG基础网络最后的池化层结果做扩展卷积（带孔卷积）；
end_points['block6'] = net #conv6结果存入end_points，name='block6'
net = tf.layers.dropout(net, rate=dropout_keep_prob, training=is_training) #dropout层
# Block 7: 1x1 conv. Because the fuck.
net = slim.conv2d(net, 1024, [1, 1], scope='conv7') #将dropout后的网络做1x1卷积，输出1024特征，name='block7'
end_points['block7'] = net
net = tf.layers.dropout(net, rate=dropout_keep_prob, training=is_training) #将卷积后的网络继续做dropout
# Block 8/9/10/11: 1x1 and 3x3 convolutions stride 2 (except lasts).
end_point = 'block8'
with tf.variable_scope(end_point):
net = slim.conv2d(net, 256, [1, 1], scope='conv1x1') #对上述dropout的网络做1x1卷积，然后做3x3卷积，,输出512特征图，name=‘block8’
net = custom_layers.pad2d(net, pad=(1, 1))
net = slim.conv2d(net, 512, [3, 3], stride=2, scope='conv3x3', padding='VALID')
end_points[end_point] = net
end_point = 'block9'
with tf.variable_scope(end_point):
net = slim.conv2d(net, 128, [1, 1], scope='conv1x1') #对上述网络做1x1卷积，然后做3x3卷积，输出256特征图，name=‘block9’
net = custom_layers.pad2d(net, pad=(1, 1))
net = slim.conv2d(net, 256, [3, 3], stride=2, scope='conv3x3', padding='VALID')
end_points[end_point] = net
end_point = 'block10'
with tf.variable_scope(end_point):
net = slim.conv2d(net, 128, [1, 1], scope='conv1x1') #对上述网络做1x1卷积，然后做3x3卷积，输出256特征图，name=‘block10’
net = slim.conv2d(net, 256, [3, 3], scope='conv3x3', padding='VALID')
end_points[end_point] = net
end_point = 'block11'
with tf.variable_scope(end_point):
net = slim.conv2d(net, 128, [1, 1], scope='conv1x1') #对上述网络做1x1卷积，然后做3x3卷积，输出256特征图，name=‘block11’
net = slim.conv2d(net, 256, [3, 3], scope='conv3x3', padding='VALID')
end_points[end_point] = net
# Prediction and localisations layers. #预测和定位
predictions = []
logits = []
localisations = []
for i, layer in enumerate(feat_layers): #遍历特征层
with tf.variable_scope(layer + '_box'): #起个命名范围
p, l = ssd_multibox_layer(end_points[layer], #做多尺度大小box预测的特征层，返回每个cell中每个先验框预测的类别p和预测的位置l
num_classes, #种类数
anchor_sizes[i], #先验框尺度（同一特征图上的先验框尺度和长宽比一致）
anchor_ratios[i], #先验框长宽比
normalizations[i]) #每个特征正则化信息，目前是只对第一个特征图做归一化操作；
#把每一层的预测收集
predictions.append(prediction_fn(p)) #prediction_fn为softmax，预测类别
logits.append(p) #把每个cell每个先验框预测的类别的概率值存在logits中
localisations.append(l) #预测位置信息
return predictions, localisations, logits, end_points #返回类别预测结果，位置预测结果，所属某个类别的概率值，以及特征层
ssd_net.default_image_size = 300
def ssd_arg_scope(weight_decay=0.0005, data_format='NHWC'): #权重衰减系数=0.0005；其是L2正则化项的系数
"""Defines the VGG arg scope.
Args:
weight_decay: The l2 regularization coefficient.
Returns:
An arg_scope.
"""
with slim.arg_scope([slim.conv2d, slim.fully_connected],
activation_fn=tf.nn.relu,
weights_regularizer=slim.l2_regularizer(weight_decay),
weights_initializer=tf.contrib.layers.xavier_initializer(),
biases_initializer=tf.zeros_initializer()):
with slim.arg_scope([slim.conv2d, slim.max_pool2d],
padding='SAME',
data_format=data_format):
with slim.arg_scope([custom_layers.pad2d,
custom_layers.l2_normalization,
custom_layers.channel_to_last],
data_format=data_format) as sc:
return sc
# =========================================================================== #
# Caffe scope: importing weights at initialization.
# =========================================================================== #
def ssd_arg_scope_caffe(caffe_scope):
"""Caffe scope definition.
Args:
caffe_scope: Caffe scope object with loaded weights.
Returns:
An arg_scope.
"""
# Default network arg scope.
with slim.arg_scope([slim.conv2d],
activation_fn=tf.nn.relu,
weights_initializer=caffe_scope.conv_weights_init(),
biases_initializer=caffe_scope.conv_biases_init()):
with slim.arg_scope([slim.fully_connected],
activation_fn=tf.nn.relu):
with slim.arg_scope([custom_layers.l2_normalization],
scale_initializer=caffe_scope.l2_norm_scale_init()):
with slim.arg_scope([slim.conv2d, slim.max_pool2d],
padding='SAME') as sc:
return sc
# =========================================================================== #
# SSD loss function.
# =========================================================================== #
def ssd_losses(logits, localisations, #损失函数定义为位置误差和置信度误差的加权和；
gclasses, glocalisations, gscores,
match_threshold=0.5,
negative_ratio=3.,
alpha=1., #位置误差权重系数
label_smoothing=0.,
device='/cpu:0',
scope=None):
with tf.name_scope(scope, 'ssd_losses'):
lshape = tfe.get_shape(logits[0], 5)
num_classes = lshape[-1]
batch_size = lshape[0]
# Flatten out all vectors!
flogits = []
fgclasses = []
fgscores = []
flocalisations = []
fglocalisations = []
for i in range(len(logits)):
flogits.append(tf.reshape(logits[i], [-1, num_classes])) #将类别的概率值reshape成（-1,21）
fgclasses.append(tf.reshape(gclasses[i], [-1])) #真实类别
fgscores.append(tf.reshape(gscores[i], [-1])) #预测真实目标的得分
flocalisations.append(tf.reshape(localisations[i], [-1, 4])) #预测真实目标边框坐标（编码形式的值）
fglocalisations.append(tf.reshape(glocalisations[i], [-1, 4])) #用于将真实目标gt的坐标进行编码存储
# And concat the crap!
logits = tf.concat(flogits, axis=0)
gclasses = tf.concat(fgclasses, axis=0)
gscores = tf.concat(fgscores, axis=0)
localisations = tf.concat(flocalisations, axis=0)
glocalisations = tf.concat(fglocalisations, axis=0)
dtype = logits.dtype
# Compute positive matching mask...
pmask = gscores > match_threshold #预测框与真实框IOU>0.5则将这个先验作为正样本
fpmask = tf.cast(pmask, dtype)
n_positives = tf.reduce_sum(fpmask) #求正样本数量N
# Hard negative mining... 为了保证正负样本尽量平衡，SSD采用了hard negative mining，就是对负样本进行抽样，抽样时按照置信度误差（预测背景的置信度越小，误差越大）进行降序排列，选取误差的较大的top-k作为训练的负样本，以保证正负样本比例接近1:3
no_classes = tf.cast(pmask, tf.int32)
predictions = slim.softmax(logits) #类别预测
nmask = tf.logical_and(tf.logical_not(pmask),
gscores > -0.5)
fnmask = tf.cast(nmask, dtype)
nvalues = tf.where(nmask,
predictions[:, 0],
1. - fnmask)
nvalues_flat = tf.reshape(nvalues, [-1])
# Number of negative entries to select.
max_neg_entries = tf.cast(tf.reduce_sum(fnmask), tf.int32)
n_neg = tf.cast(negative_ratio * n_positives, tf.int32) + batch_size #负样本数量，保证是正样本3倍
n_neg = tf.minimum(n_neg, max_neg_entries)
val, idxes = tf.nn.top_k(-nvalues_flat, k=n_neg) #抽样时按照置信度误差（预测背景的置信度越小，误差越大）进行降序排列，选取误差的较大的top-k作为训练的负样本
max_hard_pred = -val[-1]
# Final negative mask.
nmask = tf.logical_and(nmask, nvalues < max_hard_pred)
fnmask = tf.cast(nmask, dtype)
# Add cross-entropy loss. #交叉熵
with tf.name_scope('cross_entropy_pos'):
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, #类别置信度误差
labels=gclasses)
loss = tf.div(tf.reduce_sum(loss * fpmask), batch_size, name='value') #将置信度误差除以正样本数后除以batch-size
tf.losses.add_loss(loss)
with tf.name_scope('cross_entropy_neg'):
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,
labels=no_classes)
loss = tf.div(tf.reduce_sum(loss * fnmask), batch_size, name='value')
tf.losses.add_loss(loss)
# Add localization loss: smooth L1, L2, ...
with tf.name_scope('localization'):
# Weights Tensor: positive mask + random negative.
weights = tf.expand_dims(alpha * fpmask, axis=-1)
loss = custom_layers.abs_smooth(localisations - glocalisations) #先验框对应边界的位置预测值-真实位置；然后做Smooth L1 loss
loss = tf.div(tf.reduce_sum(loss * weights), batch_size, name='value') #将上面的loss*权重（=alpha/正样本数）求和后除以batch-size
tf.losses.add_loss(loss) #获得置信度误差和位置误差的加权和
def ssd_losses_old(logits, localisations,
gclasses, glocalisations, gscores,
match_threshold=0.5,
negative_ratio=3.,
alpha=1.,
label_smoothing=0.,
device='/cpu:0',
scope=None):
"""Loss functions for training the SSD 300 VGG network.
This function defines the different loss components of the SSD, and
adds them to the TF loss collection.
Arguments:
logits: (list of) predictions logits Tensors;
localisations: (list of) localisations Tensors;
gclasses: (list of) groundtruth labels Tensors;
glocalisations: (list of) groundtruth localisations Tensors;
gscores: (list of) groundtruth score Tensors;
"""
with tf.device(device):
with tf.name_scope(scope, 'ssd_losses'):
l_cross_pos = []
l_cross_neg = []
l_loc = []
for i in range(len(logits)):
dtype = logits[i].dtype
with tf.name_scope('block_%i' % i):
# Sizing weight...
wsize = tfe.get_shape(logits[i], rank=5)
wsize = wsize[1] * wsize[2] * wsize[3]
# Positive mask.
pmask = gscores[i] > match_threshold
fpmask = tf.cast(pmask, dtype)
n_positives = tf.reduce_sum(fpmask)
# Select some random negative entries.
# n_entries = np.prod(gclasses[i].get_shape().as_list())
# r_positive = n_positives / n_entries
# r_negative = negative_ratio * n_positives / (n_entries - n_positives)
# Negative mask.
no_classes = tf.cast(pmask, tf.int32)
predictions = slim.softmax(logits[i])
nmask = tf.logical_and(tf.logical_not(pmask),
gscores[i] > -0.5)
fnmask = tf.cast(nmask, dtype)
nvalues = tf.where(nmask,
predictions[:, :, :, :, 0],
1. - fnmask)
nvalues_flat = tf.reshape(nvalues, [-1])
# Number of negative entries to select.
n_neg = tf.cast(negative_ratio * n_positives, tf.int32)
n_neg = tf.maximum(n_neg, tf.size(nvalues_flat) // 8)
n_neg = tf.maximum(n_neg, tf.shape(nvalues)[0] * 4)
max_neg_entries = 1 + tf.cast(tf.reduce_sum(fnmask), tf.int32)
n_neg = tf.minimum(n_neg, max_neg_entries)
val, idxes = tf.nn.top_k(-nvalues_flat, k=n_neg)
max_hard_pred = -val[-1]
# Final negative mask.
nmask = tf.logical_and(nmask, nvalues < max_hard_pred)
fnmask = tf.cast(nmask, dtype)
# Add cross-entropy loss.
with tf.name_scope('cross_entropy_pos'):
fpmask = wsize * fpmask
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits[i],
labels=gclasses[i])
loss = tf.losses.compute_weighted_loss(loss, fpmask)
l_cross_pos.append(loss)
with tf.name_scope('cross_entropy_neg'):
fnmask = wsize * fnmask
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits[i],
labels=no_classes)
loss = tf.losses.compute_weighted_loss(loss, fnmask)
l_cross_neg.append(loss)
# Add localization loss: smooth L1, L2, ...
with tf.name_scope('localization'):
# Weights Tensor: positive mask + random negative.
weights = tf.expand_dims(alpha * fpmask, axis=-1)
loss = custom_layers.abs_smooth(localisations[i] - glocalisations[i])
loss = tf.losses.compute_weighted_loss(loss, weights)
l_loc.append(loss)
# Additional total losses...
with tf.name_scope('total'):
total_cross_pos = tf.add_n(l_cross_pos, 'cross_entropy_pos')
total_cross_neg = tf.add_n(l_cross_neg, 'cross_entropy_neg')
total_cross = tf.add(total_cross_pos, total_cross_neg, 'cross_entropy')
total_loc = tf.add_n(l_loc, 'localization')
# Add to EXTRA LOSSES TF.collection
tf.add_to_collection('EXTRA_LOSSES', total_cross_pos)
tf.add_to_collection('EXTRA_LOSSES', total_cross_neg)
tf.add_to_collection('EXTRA_LOSSES', total_cross)
tf.add_to_collection('EXTRA_LOSSES', total_loc)

其中custom_layers.py的代码解析如下：

#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Implement some custom layers, not provided by TensorFlow.
Trying to follow as much as possible the style/standards used in
tf.contrib.layers
"""
import tensorflow as tf
from tensorflow.contrib.framework.python.ops import add_arg_scope
from tensorflow.contrib.layers.python.layers import initializers
from tensorflow.contrib.framework.python.ops import variables
from tensorflow.contrib.layers.python.layers import utils
from tensorflow.python.ops import nn
from tensorflow.python.ops import init_ops
from tensorflow.python.ops import variable_scope
def abs_smooth(x):
"""Smoothed absolute function. Useful to compute an L1 smooth error. 当预测值与目标值相差很大时, 梯度容易爆炸,因此L1 loss对噪声（outliers）更鲁棒
Define as:
x^2 / 2 if abs(x) < 1
abs(x) - 0.5 if abs(x) > 1
We use here a differentiable definition using min(x) and abs(x). Clearly
not optimal, but good enough for our purpose!
"""
absx = tf.abs(x)
minx = tf.minimum(absx, 1)
r = 0.5 * ((absx - 1) * minx + absx) #计算得到L1 smooth loss
return r
@add_arg_scope
def l2_normalization( #L2正则化：稀疏正则化操作
inputs, #输入特征层，[batch_size,h,w,c]
scaling=False, #默认归一化后是否设置缩放变量gamma
scale_initializer=init_ops.ones_initializer(), #scale初始化为1
reuse=None,
variables_collections=None,
outputs_collections=None,
data_format='NHWC',
trainable=True,
scope=None):
"""Implement L2 normalization on every feature (i.e. spatial normalization).
Should be extended in some near future to other dimensions, providing a more
flexible normalization framework.
Args:
inputs: a 4-D tensor with dimensions [batch_size, height, width, channels].
scaling: whether or not to add a post scaling operation along the dimensions
which have been normalized.
scale_initializer: An initializer for the weights.
reuse: whether or not the layer and its variables should be reused. To be
able to reuse the layer scope must be given.
variables_collections: optional list of collections for all the variables or
a dictionary containing a different list of collection per variable.
outputs_collections: collection to add the outputs.
data_format: NHWC or NCHW data format.
trainable: If `True` also add variables to the graph collection
`GraphKeys.TRAINABLE_VARIABLES` (see tf.Variable).
scope: Optional scope for `variable_scope`.
Returns:
A `Tensor` representing the output of the operation.
"""
with variable_scope.variable_scope(
scope, 'L2Normalization', [inputs], reuse=reuse) as sc:
inputs_shape = inputs.get_shape() #得到输入特征层的维度信息
inputs_rank = inputs_shape.ndims #维度数=4
dtype = inputs.dtype.base_dtype #数据类型
if data_format == 'NHWC':
# norm_dim = tf.range(1, inputs_rank-1)
norm_dim = tf.range(inputs_rank-1, inputs_rank) #需要正则化的维度是4-1=3即channel这个维度
params_shape = inputs_shape[-1:] #通道数
elif data_format == 'NCHW':
# norm_dim = tf.range(2, inputs_rank)
norm_dim = tf.range(1, 2) #需要正则化的维度是第1维，即channel这个维度
params_shape = (inputs_shape[1]) #通道数
# Normalize along spatial dimensions.
outputs = nn.l2_normalize(inputs, norm_dim, epsilon=1e-12) #对通道所在维度进行正则化，其中epsilon是避免除0风险
# Additional scaling.
if scaling: #判断是否对正则化后设置缩放变量
scale_collections = utils.get_variable_collections(
variables_collections, 'scale')
scale = variables.model_variable('gamma',
shape=params_shape,
dtype=dtype,
initializer=scale_initializer,
collections=scale_collections,
trainable=trainable)
if data_format == 'NHWC':
outputs = tf.multiply(outputs, scale)
elif data_format == 'NCHW':
scale = tf.expand_dims(scale, axis=-1)
scale = tf.expand_dims(scale, axis=-1)
outputs = tf.multiply(outputs, scale)
# outputs = tf.transpose(outputs, perm=(0, 2, 3, 1))
return utils.collect_named_outputs(outputs_collections, #即返回L2_norm*gamma
sc.original_name_scope, outputs)
@add_arg_scope
def pad2d(inputs,
pad=(0, 0),
mode='CONSTANT',
data_format='NHWC',
trainable=True,
scope=None):
"""2D Padding layer, adding a symmetric padding to H and W dimensions.
Aims to mimic padding in Caffe and MXNet, helping the port of models to
TensorFlow. Tries to follow the naming convention of `tf.contrib.layers`.
Args:
inputs: 4D input Tensor;
pad: 2-Tuple with padding values for H and W dimensions;
mode: Padding mode. C.f. `tf.pad`
data_format: NHWC or NCHW data format.
"""
with tf.name_scope(scope, 'pad2d', [inputs]):
# Padding shape.
if data_format == 'NHWC':
paddings = [[0, 0], [pad[0], pad[0]], [pad[1], pad[1]], [0, 0]]
elif data_format == 'NCHW':
paddings = [[0, 0], [0, 0], [pad[0], pad[0]], [pad[1], pad[1]]]
net = tf.pad(inputs, paddings, mode=mode)
return net
@add_arg_scope
def channel_to_last(inputs, #作用，将输入的特征图网络的通道维度放在最后，返回变形后的网络
data_format='NHWC',
scope=None):
"""Move the channel axis to the last dimension. Allows to
provide a single output format whatever the input data format.
Args:
inputs: Input Tensor;
data_format: NHWC or NCHW.
Return:
Input in NHWC format.
"""
with tf.name_scope(scope, 'channel_to_last', [inputs]):
if data_format == 'NHWC':
net = inputs
elif data_format == 'NCHW':
net = tf.transpose(inputs, perm=(0, 2, 3, 1))
return net

其中ssd_common.py的代码解析如下：

[python] view plain copy

#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Shared function between different SSD implementations.
"""
import numpy as np
import tensorflow as tf
import tf_extended as tfe
# =========================================================================== #
# TensorFlow implementation of boxes SSD encoding / decoding.
# =========================================================================== #
def tf_ssd_bboxes_encode_layer(labels, #gt标签，1D的tensor
bboxes, #Nx4的Tensor(float)，真实的bbox
anchors_layer, #参考锚点list
num_classes, #分类类别数
no_annotation_label,
ignore_threshold=0.5, #gt和锚点框间的匹配阈值，大于该值则为正样本
prior_scaling=[0.1, 0.1, 0.2, 0.2], #真实值到预测值转换中用到的缩放
dtype=tf.float32):
"""Encode groundtruth labels and bounding boxes using SSD anchors from
one layer.
Arguments:
labels: 1D Tensor(int64) containing groundtruth labels;
bboxes: Nx4 Tensor(float) with bboxes relative coordinates;
anchors_layer: Numpy array with layer anchors;
matching_threshold: Threshold for positive match with groundtruth bboxes;
prior_scaling: Scaling of encoded coordinates.
Return:
(target_labels, target_localizations, target_scores): Target Tensors. 返回：包含目标标签类别，目标位置，目标置信度的tesndor
"""
# Anchors coordinates and volume.
yref, xref, href, wref = anchors_layer #此前每个特征图上点对应生成的锚点框作为参考框
ymin = yref - href / 2. #求参考框的左上角点（xmin,ymin）和右下角点(xmax,ymax)
xmin = xref - wref / 2. #yref和xref的shape为（38,38,1）；href和wref的shape为（4，）
ymax = yref + href / 2.
xmax = xref + wref / 2.
vol_anchors = (xmax - xmin) * (ymax - ymin) #求参考框面积vol_anchors
# Initialize tensors... #shape表示每个特征图上总锚点数
shape = (yref.shape[0], yref.shape[1], href.size) #对于第一个特征图，shape=(38,38,4)；第二个特征图的shape=(19,19,6)
feat_labels = tf.zeros(shape, dtype=tf.int64) #初始化每个特征图上的点对应的各个box所属标签维度如：38x38x4
feat_scores = tf.zeros(shape, dtype=dtype) #初始化每个特征图上的点对应的各个box所属标目标的得分值维度如：38x38x4
feat_ymin = tf.zeros(shape, dtype=dtype) #预测每个特征图每个点所属目标的坐标；如38x38x4;初始化为全0
feat_xmin = tf.zeros(shape, dtype=dtype)
feat_ymax = tf.ones(shape, dtype=dtype)
feat_xmax = tf.ones(shape, dtype=dtype)
def jaccard_with_anchors(bbox): #计算gt的框和参考锚点框的重合度
"""Compute jaccard score between a box and the anchors.
"""
int_ymin = tf.maximum(ymin, bbox[0]) #计算重叠区域的坐标
int_xmin = tf.maximum(xmin, bbox[1])
int_ymax = tf.minimum(ymax, bbox[2])
int_xmax = tf.minimum(xmax, bbox[3])
h = tf.maximum(int_ymax - int_ymin, 0.) #计算重叠区域的长与宽
w = tf.maximum(int_xmax - int_xmin, 0.)
# Volumes.
inter_vol = h * w #重叠区域的面积
union_vol = vol_anchors - inter_vol \ #计算bbox和参考框的并集区域
+ (bbox[2] - bbox[0]) * (bbox[3] - bbox[1])
jaccard = tf.div(inter_vol, union_vol) #计算IOU并返回该值
return jaccard
def intersection_with_anchors(bbox): #计算某个参考框包含真实框的得分情况
"""Compute intersection between score a box and the anchors.
"""
int_ymin = tf.maximum(ymin, bbox[0]) #计算bbox和锚点框重叠区域的坐标和长宽
int_xmin = tf.maximum(xmin, bbox[1])
int_ymax = tf.minimum(ymax, bbox[2])
int_xmax = tf.minimum(xmax, bbox[3])
h = tf.maximum(int_ymax - int_ymin, 0.)
w = tf.maximum(int_xmax - int_xmin, 0.)
inter_vol = h * w #重叠区域面积
scores = tf.div(inter_vol, vol_anchors) #将重叠区域面积除以参考框面积作为该参考框得分值；
return scores
def condition(i, feat_labels, feat_scores,
feat_ymin, feat_xmin, feat_ymax, feat_xmax):
"""Condition: check label index.
"""
r = tf.less(i, tf.shape(labels)) # 逐元素比较大小,遍历labels，因为i在body返回的时候加1了
return r[0]
def body(i, feat_labels, feat_scores, #该函数大致意思是选择与gt box IOU最大的锚点框负责回归任务，并预测对应的边界框，如此循环
feat_ymin, feat_xmin, feat_ymax, feat_xmax):
"""Body: update feature labels, scores and bboxes.
Follow the original SSD paper for that purpose:
- assign values when jaccard > 0.5;
- only update if beat the score of other bboxes.
"""
# Jaccard score. #计算bbox与参考框的IOU值
label = labels[i]
bbox = bboxes[i]
jaccard = jaccard_with_anchors(bbox)
# Mask: check threshold + scores + no annotations + num_classes.
mask = tf.greater(jaccard, feat_scores) #当IOU大于feat_scores时，对应的mask至1，做筛选
# mask = tf.logical_and(mask, tf.greater(jaccard, matching_threshold))
mask = tf.logical_and(mask, feat_scores > -0.5)
mask = tf.logical_and(mask, label < num_classes) #label满足<21
imask = tf.cast(mask, tf.int64) #将mask转换数据类型int型
fmask = tf.cast(mask, dtype) #将mask转换数据类型float型
# Update values using mask.
feat_labels = imask * label + (1 - imask) * feat_labels #当mask=1，则feat_labels=1；否则为0，即背景
feat_scores = tf.where(mask, jaccard, feat_scores) #tf.where表示如果mask为真则jaccard，否则为feat_scores
feat_ymin = fmask * bbox[0] + (1 - fmask) * feat_ymin #选择与GT bbox IOU最大的框作为GT bbox，然后循环
feat_xmin = fmask * bbox[1] + (1 - fmask) * feat_xmin
feat_ymax = fmask * bbox[2] + (1 - fmask) * feat_ymax
feat_xmax = fmask * bbox[3] + (1 - fmask) * feat_xmax
# Check no annotation label: ignore these anchors... #对没有标注标签的锚点框做忽视，应该是背景
# interscts = intersection_with_anchors(bbox)
# mask = tf.logical_and(interscts > ignore_threshold,
# label == no_annotation_label)
# # Replace scores by -1.
# feat_scores = tf.where(mask, -tf.cast(mask, dtype), feat_scores)
return [i+1, feat_labels, feat_scores,
feat_ymin, feat_xmin, feat_ymax, feat_xmax]
# Main loop definition.
i = 0
[i, feat_labels, feat_scores,
feat_ymin, feat_xmin,
feat_ymax, feat_xmax] = tf.while_loop(condition, body,
[i, feat_labels, feat_scores,
feat_ymin, feat_xmin,
feat_ymax, feat_xmax])
# Transform to center / size. #转换为中心及长宽形式（计算补偿后的中心）
feat_cy = (feat_ymax + feat_ymin) / 2. #真实预测值其实是边界框相对于先验框的转换值，encode就是为了求这个转换值
feat_cx = (feat_xmax + feat_xmin) / 2.
feat_h = feat_ymax - feat_ymin
feat_w = feat_xmax - feat_xmin
# Encode features.
feat_cy = (feat_cy - yref) / href / prior_scaling[0] #(预测真实边界框中心y-参考框中心y)/参考框高/缩放尺度
feat_cx = (feat_cx - xref) / wref / prior_scaling[1]
feat_h = tf.log(feat_h / href) / prior_scaling[2] #log(预测真实边界框高h/参考框高h)/缩放尺度
feat_w = tf.log(feat_w / wref) / prior_scaling[3]
# Use SSD ordering: x / y / w / h instead of ours.
feat_localizations = tf.stack([feat_cx, feat_cy, feat_w, feat_h], axis=-1) #返回（cx转换值,cy转换值,w转换值,h转换值）形式的边界框的预测值（其实是预测框相对于参考框的转换）
return feat_labels, feat_localizations, feat_scores #返回目标标签，目标预测值（位置转换值），目标置信度
#经过我们回归得到的变换，经过变换得到真实框，所以这个地方损失函数其实是我们预测的是变换，我们实际的框和anchor之间的变换和我们预测的变换之间的loss。我们回归的是一种变换。并不是直接预测框，这个和YOLO是不一样的。和Faster RCNN是一样的
def tf_ssd_bboxes_encode(labels, #1D的tensor 包含gt标签
bboxes, #Nx4的tensor包含真实框的相对坐标
anchors, #参考锚点框信息（y,x,h,w) 其中y,x是中心坐标
num_classes,
no_annotation_label,
ignore_threshold=0.5,
prior_scaling=[0.1, 0.1, 0.2, 0.2],
dtype=tf.float32,
scope='ssd_bboxes_encode'):
"""Encode groundtruth labels and bounding boxes using SSD net anchors.
Encoding boxes for all feature layers.
Arguments:
labels: 1D Tensor(int64) containing groundtruth labels;
bboxes: Nx4 Tensor(float) with bboxes relative coordinates;
anchors: List of Numpy array with layer anchors;
matching_threshold: Threshold for positive match with groundtruth bboxes;
prior_scaling: Scaling of encoded coordinates.
Return:
(target_labels, target_localizations, target_scores): #返回：目标标签，目标位置，目标得分值（都是list形式）
Each element is a list of target Tensors.
"""
with tf.name_scope(scope):
target_labels = [] #目标标签
target_localizations = [] #目标位置
target_scores = [] #目标得分
for i, anchors_layer in enumerate(anchors): #对所有特征图中的参考框做遍历
with tf.name_scope('bboxes_encode_block_%i' % i):
t_labels, t_loc, t_scores = \
tf_ssd_bboxes_encode_layer(labels, bboxes, anchors_layer, #输入真实标签，gt位置大小，参考框位置大小……得到预测真实标签，参考框到真实框的转换以及得分
num_classes, no_annotation_label,
ignore_threshold,
prior_scaling, dtype)
target_labels.append(t_labels)
target_localizations.append(t_loc)
target_scores.append(t_scores)
return target_labels, target_localizations, target_scores
def tf_ssd_bboxes_decode_layer(feat_localizations, #解码，在预测时用到，根据之前得到的预测值相对于参考框的转换值后，反推出真实位置（该位置包括真实的x,y,w,h）
anchors_layer, #需要输入：预测框和参考框的转换feat_localizations，参考框位置尺度信息anchors_layer，以及转换时用到的缩放
prior_scaling=[0.1, 0.1, 0.2, 0.2]): #输出真实预测框的ymin,xmin，ymax,xmax
"""Compute the relative bounding boxes from the layer features and
reference anchor bounding boxes.
Arguments:
feat_localizations: Tensor containing localization features.
anchors: List of numpy array containing anchor boxes.
Return:
Tensor Nx4: ymin, xmin, ymax, xmax
"""
yref, xref, href, wref = anchors_layer #锚点框的参考中心点以及长宽
# Compute center, height and width
cx = feat_localizations[:, :, :, :, 0] * wref * prior_scaling[0] + xref
cy = feat_localizations[:, :, :, :, 1] * href * prior_scaling[1] + yref
w = wref * tf.exp(feat_localizations[:, :, :, :, 2] * prior_scaling[2])
h = href * tf.exp(feat_localizations[:, :, :, :, 3] * prior_scaling[3])
# Boxes coordinates.
ymin = cy - h / 2.
xmin = cx - w / 2.
ymax = cy + h / 2.
xmax = cx + w / 2.
bboxes = tf.stack([ymin, xmin, ymax, xmax], axis=-1)
return bboxes #预测真实框的坐标信息（两点式的框）
def tf_ssd_bboxes_decode(feat_localizations,
anchors,
prior_scaling=[0.1, 0.1, 0.2, 0.2],
scope='ssd_bboxes_decode'):
"""Compute the relative bounding boxes from the SSD net features and
reference anchors bounding boxes.
Arguments:
feat_localizations: List of Tensors containing localization features.
anchors: List of numpy array containing anchor boxes.
Return:
List of Tensors Nx4: ymin, xmin, ymax, xmax
"""
with tf.name_scope(scope):
bboxes = []
for i, anchors_layer in enumerate(anchors):
bboxes.append(
tf_ssd_bboxes_decode_layer(feat_localizations[i],
anchors_layer,
prior_scaling))
return bboxes
# =========================================================================== #
# SSD boxes selection.
# =========================================================================== #
def tf_ssd_bboxes_select_layer(predictions_layer, localizations_layer, #输入预测得到的类别和位置做筛选
select_threshold=None,
num_classes=21,
ignore_class=0,
scope=None):
"""Extract classes, scores and bounding boxes from features in one layer.
Batch-compatible: inputs are supposed to have batch-type shapes.
Args:
predictions_layer: A SSD prediction layer;
localizations_layer: A SSD localization layer;
select_threshold: Classification threshold for selecting a box. All boxes
under the threshold are set to 'zero'. If None, no threshold applied.
Return:
d_scores, d_bboxes: Dictionary of scores and bboxes Tensors of
size Batches X N x 1 | 4. Each key corresponding to a class.
"""
select_threshold = 0.0 if select_threshold is None else select_threshold
with tf.name_scope(scope, 'ssd_bboxes_select_layer',
[predictions_layer, localizations_layer]):
# Reshape features: Batches x N x N_labels | 4
p_shape = tfe.get_shape(predictions_layer)
predictions_layer = tf.reshape(predictions_layer,
tf.stack([p_shape[0], -1, p_shape[-1]]))
l_shape = tfe.get_shape(localizations_layer)
localizations_layer = tf.reshape(localizations_layer,
tf.stack([l_shape[0], -1, l_shape[-1]]))
d_scores = {}
d_bboxes = {}
for c in range(0, num_classes):
if c != ignore_class: #如果不是背景类别
# Remove boxes under the threshold. #去掉低于阈值的box
scores = predictions_layer[:, :, c] #预测为第c类别的得分值
fmask = tf.cast(tf.greater_equal(scores, select_threshold), scores.dtype)
scores = scores * fmask #保留得分值大于阈值的得分
bboxes = localizations_layer * tf.expand_dims(fmask, axis=-1)
# Append to dictionary.
d_scores[c] = scores
d_bboxes[c] = bboxes
return d_scores, d_bboxes #返回字典，每个字典里是对应某类的预测权重和框位置信息；
def tf_ssd_bboxes_select(predictions_net, localizations_net, #输入：SSD网络输出的预测层list；定位层list；类别选择框阈值（None表示都选）
select_threshold=None, #返回一个字典，key为类别，值为得分和bbox坐标
num_classes=21, #包含了背景类别
ignore_class=0, #第0类是背景
scope=None):
"""Extract classes, scores and bounding boxes from network output layers.
Batch-compatible: inputs are supposed to have batch-type shapes.
Args:
predictions_net: List of SSD prediction layers;
localizations_net: List of localization layers;
select_threshold: Classification threshold for selecting a box. All boxes
under the threshold are set to 'zero'. If None, no threshold applied.
Return:
d_scores, d_bboxes: Dictionary of scores and bboxes Tensors of #返回一个字典，其中key是对应类别，值对应得分值和坐标信息
size Batches X N x 1 | 4. Each key corresponding to a class.
"""
with tf.name_scope(scope, 'ssd_bboxes_select',
[predictions_net, localizations_net]):
l_scores = []
l_bboxes = []
for i in range(len(predictions_net)):
scores, bboxes = tf_ssd_bboxes_select_layer(predictions_net[i],
localizations_net[i],
select_threshold,
num_classes,
ignore_class)
l_scores.append(scores) #对应某个类别的得分
l_bboxes.append(bboxes) #对应某个类别的box坐标信息
# Concat results.
d_scores = {}
d_bboxes = {}
for c in l_scores[0].keys():
ls = [s[c] for s in l_scores]
lb = [b[c] for b in l_bboxes]
d_scores[c] = tf.concat(ls, axis=1)
d_bboxes[c] = tf.concat(lb, axis=1)
return d_scores, d_bboxes
def tf_ssd_bboxes_select_layer_all_classes(predictions_layer, localizations_layer,
select_threshold=None):
"""Extract classes, scores and bounding boxes from features in one layer.
Batch-compatible: inputs are supposed to have batch-type shapes.
Args:
predictions_layer: A SSD prediction layer;
localizations_layer: A SSD localization layer;
select_threshold: Classification threshold for selecting a box. If None,
select boxes whose classification score is higher than 'no class'.
Return:
classes, scores, bboxes: Input Tensors. #输出：类别，得分，框
"""
# Reshape features: Batches x N x N_labels | 4
p_shape = tfe.get_shape(predictions_layer)
predictions_layer = tf.reshape(predictions_layer,
tf.stack([p_shape[0], -1, p_shape[-1]]))
l_shape = tfe.get_shape(localizations_layer)
localizations_layer = tf.reshape(localizations_layer,
tf.stack([l_shape[0], -1, l_shape[-1]]))
# Boxes selection: use threshold or score > no-label criteria.
if select_threshold is None or select_threshold == 0:
# Class prediction and scores: assign 0. to 0-class
classes = tf.argmax(predictions_layer, axis=2)
scores = tf.reduce_max(predictions_layer, axis=2)
scores = scores * tf.cast(classes > 0, scores.dtype)
else:
sub_predictions = predictions_layer[:, :, 1:]
classes = tf.argmax(sub_predictions, axis=2) + 1
scores = tf.reduce_max(sub_predictions, axis=2)
# Only keep predictions higher than threshold.
mask = tf.greater(scores, select_threshold)
classes = classes * tf.cast(mask, classes.dtype)
scores = scores * tf.cast(mask, scores.dtype)
# Assume localization layer already decoded.
bboxes = localizations_layer
return classes, scores, bboxes #寻找当前特征图中类别，得分，bbox
def tf_ssd_bboxes_select_all_classes(predictions_net, localizations_net,
select_threshold=None,
scope=None):
"""Extract classes, scores and bounding boxes from network output layers.
Batch-compatible: inputs are supposed to have batch-type shapes.
Args:
predictions_net: List of SSD prediction layers;
localizations_net: List of localization layers;
select_threshold: Classification threshold for selecting a box. If None,
select boxes whose classification score is higher than 'no class'.
Return:
classes, scores, bboxes: Tensors.
"""
with tf.name_scope(scope, 'ssd_bboxes_select',
[predictions_net, localizations_net]):
l_classes = []
l_scores = []
l_bboxes = []
for i in range(len(predictions_net)):
classes, scores, bboxes = \
tf_ssd_bboxes_select_layer_all_classes(predictions_net[i],
localizations_net[i],
select_threshold)
l_classes.append(classes)
l_scores.append(scores)
l_bboxes.append(bboxes)
classes = tf.concat(l_classes, axis=1)
scores = tf.concat(l_scores, axis=1)
bboxes = tf.concat(l_bboxes, axis=1)
return classes, scores, bboxes #返回所有特征图综合得出的类别，得分，bbox

参考文章：

https://zhuanlan.zhihu.com/p/33544892

https://zhuanlan.zhihu.com/p/25100992

https://blog.csdn.net/qq1483661204/article/details/79776065

你可能感兴趣的:(神经网络,TensorFlow学习)

动手学深度学习-卷积神经网络-3填充和步幅像污秽一样动手学深度学习深度学习 cnn 人工智能神经网络
目录填充步幅小结在上一节的例子（下图）中，输入的高度和宽度都为3，卷积核的高度和宽度都为2，生成的输出表征的维数为2×2。正如我们在上一节中所概括的那样，假设输入形状为nh×nw，卷积核形状为kh×kw，那么输出形状将是(nh−kh+1)×(nw−kw+1)。因此，卷积的输出形状取决于输入形状和卷积核的形状。还有什么因素会影响输出的大小呢？本节我们将介绍填充（padding）和步幅（stride）
神经网络及其架构和模型的关系爱吃瓜的猹z 大模型神经网络架构人工智能
模型、架构、神经网络之间的关系可以理解为不同层次上的概念，它们分别涵盖了机器学习系统的不同方面。具体来说：1.神经网络神经网络是一种模型类型，基于生物神经系统的启发，用于模拟人脑的学习过程。它由**多个神经元（节点）**和连接权重组成，这些神经元组织成不同的层，通过输入数据进行学习和预测。神经网络的特点：基本组成单位：神经网络的基本单位是“神经元”（或节点），每个神经元接收输入，进行加权和激活，然
chatGPT底层原理是什么，为什么chatGPT效果这么好？三万字长文深度剖析-下会写代码的孙悟空大模型从入门到放弃 chatgpt 算法人工智能深度学习机器学习
导航chatGPT底层原理是什么，为什么chatGPT效果这么好？三万字长文深度剖析-上chatGPT底层原理是什么，为什么chatGPT效果这么好？三万字长文深度剖析-中chatGPT底层原理是什么，为什么chatGPT效果这么好？三万字长文深度剖析-下到chatGPT内部一探究竟好的，现在我们终于可以讨论ChatGPT的内部结构了。最终它是一个巨大的神经网络——目前是一个所谓的GPT-3网络版
AI人工智能深度学习算法：在生物信息学中的应用 AI大模型应用之禅 AI大模型与大数据计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
AI人工智能深度学习算法：在生物信息学中的应用关键词：人工智能、深度学习、生物信息学、基因组学、蛋白质结构预测、药物发现、个性化医疗文章目录AI人工智能深度学习算法：在生物信息学中的应用1.背景介绍2.核心概念与联系2.1人工智能（AI）2.2机器学习（ML）2.3深度学习（DL）2.4生物信息学2.5应用领域3.核心算法原理&具体操作步骤3.1算法原理概述3.1.1卷积神经网络（CNN）3.1.
Python从0到100（四十）：Web开发简介-从前端到后端（文末免费送书）是Dream呀 python 前端开发语言
前言：零基础学Python：Python从0到100最新最全教程。想做这件事情很久了，这次我更新了自己所写过的所有博客，汇集成了Python从0到100，共一百节课，帮助大家一个月时间里从零基础到学习Python基础语法、Python爬虫、Web开发、计算机视觉、机器学习、神经网络以及人工智能相关知识，成为学习学习和学业的先行者！欢迎大家订阅专栏：零基础学Python：Python从0到100最新
【TVM 教程】线性和递归核
ApacheTVM是一个端到端的深度学习编译框架，适用于CPU、GPU和各种机器学习加速芯片。更多TVM中文文档可访问→https://tvm.hyper.ai/作者：TianqiChen下面介绍如何在TVM中进行递归计算（神经网络中的典型模式）。from__future__importabsolute_import,print_functionimporttvmimporttvm.testing
iPaaS丨企业应用及数据集成的重要性和挑战谷云科技RestCloud iPaaS 混合集成平台数字化转型应用集成数据集成
在激烈的市场竞争中，企业服务总线和数据总线扮演着企业神经网络的角色，它们将不同的业务部门、系统以及数据紧密相连，保障信息流通无阻，实现资源的高效分配。这样的集成不仅提高了企业的运营效率，还增强了企业的适应性和创新力，使企业能够在竞争中保持领先。然而，企业在集成过程中面临着不少挑战，集成工具的选择便是其中之一。开发人员在整合不同系统时，需要面对数据格式、数据量、通信协议和架构差异等问题，选择合适的集
2-17-神经网络架构搜索 NANCYGOODENOUGH ~~~杂~~~
http://www.cjig.cn/jig/ch/reader/view_abstract.aspx?file_no=20210202&flag=1神经网络结构搜索(neuralarchitecturesearch)主要由搜索空间，搜索策略与性能评估3部分组成。在搜索空间设计上，出于计算量的考虑，通常不会搜索整个网络结构，而是先将网络分成几块，然后搜索块中的结构。根据实际情况的不同，可以共享不同
神经架构搜索：自动化设计神经网络的方法君君学姐架构自动化神经网络
神经架构搜索：自动化设计神经网络的方法一、引言在深度学习领域，神经网络架构的设计对模型的性能具有至关重要的影响。传统的神经网络设计依赖于专家经验和大量实验，这一过程繁琐且耗时。为了解决这一问题，神经架构搜索（NeuralArchitectureSearch,NAS）应运而生。NAS是一种自动化设计神经网络架构的方法，旨在通过搜索最优的神经网络结构来提高模型性能。本文将详细介绍神经架构搜索的定义、产
计算机视觉目标检测-DETR网络 next_travel 计算机视觉目标检测人工智能
目录摘要abstractDETR目标检测网络详解二分图匹配和损失函数DETR总结总结摘要DETR（DEtectionTRansformer）是由FacebookAI提出的一种基于Transformer架构的端到端目标检测方法。它通过将目标检测建模为集合预测问题，摒弃了锚框设计和非极大值抑制（NMS）等复杂后处理步骤。DETR使用卷积神经网络提取图像特征，并将其通过位置编码转换为输入序列，送入Tra
使用 Pyro 和 PyTorch 的贝叶斯神经网络无水先生人工智能综合 Pytorch和项目实践 pytorch 人工智能 python
一、说明构建图像分类器已成为新的“helloworld”。还记得当你第一次接触Python时，你的打印“helloworld”感觉很神奇吗？几个月前，当我按照PyTorch官方教程并为自己构建了一个运行良好的简单分类器时，我也有同样的感觉。
第八章：AI大模型的未来发展趋势8.3 新兴应用领域8.3.2 生成对抗网络的应用 AI天才研究院 AI大模型企业级应用开发实战大数据人工智能语言模型 AI LLM Java Python 架构设计 Agent RPA
1.背景介绍1.背景介绍生成对抗网络（GenerativeAdversarialNetworks，GANs）是一种深度学习技术，由伊玛·古德姆（IanGoodfellow）于2014年提出。GANs由两个相互对抗的神经网络组成：生成器（Generator）和判别器（Discriminator）。生成器生成假数据，判别器试图区分假数据和真实数据。这种对抗训练方法使得GANs能够学习数据分布并生成高质
Python从0到100（六十一）：机器学习实战-实现客户细分是Dream呀 python 机器学习开发语言
前言：零基础学Python：Python从0到100最新最全教程。想做这件事情很久了，这次我更新了自己所写过的所有博客，汇集成了Python从0到100，共一百节课，帮助大家一个月时间里从零基础到学习Python基础语法、Python爬虫、Web开发、计算机视觉、机器学习、神经网络以及人工智能相关知识，成为学习学习和学业的先行者！欢迎大家订阅专栏：零基础学Python：Python从0到100最新
简要说一下关于实现整个深度学习项目的流程懒大王12138 机器学习深度学习神经网络人工智能算法
我们以识别生物信号为例子，其他类似与图像、文本和目标/故障检测的同样适用1.信号预处理；首先要将得到的生物信号进去噪音去除，另外所有的生物信号由于采样时间不同可能长度并不一样，这时候你需要统一长度。2.特征工程；你需要对所有的经过预处理并且将要输入神经网络的信号提取特征，比如信号的频谱图、时间-频率图或者是一些非线性的动力学特征，比如相空间这些。最重要的是提取的特征数据形状必须一致。3.搭建深度学
【TCN回归预测】蜣螂算法优化时间卷积神经网络DBO-TCN负荷数据回归预测【含Matlab源码 6222期】 Matlab领域 matlab
欢迎来到海神之光博客之家✅博主简介：热爱科研的Matlab仿真开发者，修心和技术同步精进，完整代码论文复现程序定制期刊写作科研合作扫描文章底部QQ二维码。个人主页：海神之光代码获取方式：海神之光Matlab王者学习之路—代码获取方式⛳️座右铭：行百里者，半于九十。更多Matlab智能算法神经网络预测与分类仿真内容点击①Matlab神经网络预测与分类（进阶版）②付费专栏Matlab智能算法神经网络预
python中!ls -r_光学现象的Python实现 weixin_39838798 python中!ls -r
“Youwillseelightinthedarkness。Youwillmakesomesenseofthis.”“你终将于黑暗中触摸白昼，它将如影般随行。”如果说20世纪是电子的世界，那么21世纪就是光学的舞台。光学和光子学无处不在：智能手机和计算设备上的显示方式，互联网中承载信息的光纤，先进的精密制造，大量的生物医学应用终端，全光衍射神经网络等。对光学的深入理解为每一个学习物理和工程的同学带
【安装cudnn】 Eternal-Student linux linux
官网下载并安装如果打算使用深度学习框架，如TensorFlow或PyTorch，并且需要GPU加速，可能还需要安装NVIDIA的cuDNN库，它是一个GPU加速的深度神经网络库。officialweb:https://developer.nvidia.com/cudnn下载具体：cuDNN9.5.0Downloads历史版本下载：https://developer.nvidia.com/rdp/c
【ELM回归预测】蜣螂算法优化极限学习机DBO-ELM数据回归预测【含Matlab源码 3566期】 Matlab仿真科研站 matlab
欢迎来到Matlab仿真科研站博客之家✅博主简介：热爱科研的Matlab仿真开发者，修心和技术同步精进，Matlab项目合作可私信。个人主页：Matlab仿真科研站博客之家代码获取方式：扫描文章底部QQ二维码⛳️座右铭：行百里者，半于九十；路漫漫其修远兮，吾将上下而求索。⛄更多Matlab神经网络预测与分类（仿真科研站版）仿真内容点击Matlab神经网络预测与分类（仿真科研站版）⛄一、蜣螂算法优化
【电力负荷预测】蜣螂算法优化回声神经网络DBO-ESN电力负荷预测（多输入单输出）【含Matlab源码 5346期】 Matlab武动乾坤 matlab
Matlab武动乾坤博客之家
【BP回归预测】蜣螂算法优化BP神经网络DBO-BP光伏数据预测（多输入单输出）【含Matlab源码 5175期】 Matlab武动乾坤 matlab
Matlab武动乾坤博客之家
【CNN回归预测】蜣螂算法优化卷积神经网络DBO-CNN风电数据预测（多输入单输出）【含Matlab源码 5289期】 Matlab武动乾坤 matlab
Matlab武动乾坤博客之家
QRCNN-BiLSTM卷积神经网络-双向长短期记忆神经网络分位数回归区间预测附Matlab完整源码天天酷科研分位数回归区间预测（QR）QRCNN-BiLSTM 卷积双向长短期记忆神经网络分位数回归区间预测
效果模型描述QRCNN-BiLSTM卷积神经网络-双向长短期记忆神经网络分位数回归区间预测附Matlab完整源码QRCNN-BiLSTM（QuantileRegressionConvolutionalNeuralNetwork-BidirectionalLongShort-TermMemory）是一种结合了卷积神经网络（CNN）和双向长短期记忆神经网络（BiLSTM）的分位数回归模型，用于区间预测
神经网络入门推荐知识,神经网络入门书籍推荐快乐的小肥熊 ai智能写作神经网络 matlab 人工智能 python
适合初学者的神经网络和遗传算法资料遗传算法（GeneticAlgorithm）是模拟达尔文生物进化论的自然选择和遗传学机理的生物进化过程的计算模型，是一种通过模拟自然进化过程搜索最优解的方法。遗传算法是从代表问题可能潜在的解集的一个种群（population）开始的，而一个种群则由经过基因（gene）编码的一定数目的个体(individual)组成。每个个体实际上是染色体(chromosome)带
基于CNN+Transformer混合模型实现交通流量时序预测(PyTorch版) 矩阵猫咪 cnn transformer pytorch 卷积神经网络深度学习
前言系列专栏:【深度学习：算法项目实战】✨︎涉及医疗健康、财经金融、商业零售、食品饮料、运动健身、交通运输、环境科学、社交媒体以及文本和图像处理等诸多领域，讨论了各种复杂的深度神经网络思想，如卷积神经网络、循环神经网络、生成对抗网络、门控循环单元、长短期记忆、自然语言处理、深度强化学习、大型语言模型和迁移学习。随着城市化进程的加速，交通流量预测成为城市交通管理与规划中的关键任务。准确的交通流量预测
一、深度学习的基本介绍关关钧深度学习深度学习人工智能神经网络
机器学习的基本步骤：前馈运算、反向传播计算梯度、根据梯度更新参数值。一、定义及基本概念深度学习，就是一种利用深度人工神经网络来进行自动分类、预测和学习的技术。它可以从海量的数据中自动学习，找寻数据中的特征。所以说，它的本质就是自动提取特征的能力。可以说，深度学习就等于深度人工神经网络。一般认为超过三层的神经网络就可以叫做深度神经网络。深度学习属于一种特殊的人工智能技术。反向传播算法：此算法是人工神
神经网络的通俗介绍 courniche 神经网络人工智能算法
人工神经网络，是一种模仿人类大脑工作原理的数学模型。人类的大脑是由无数的小“工作站”组成的，每个工作站叫做“神经元”。这些神经元通过“电线”互相连接，负责接收、处理和传递信息。一、人类大脑神经网络人类大脑的神经网络大概长这个样子：人类大脑的神经网络包括神经元和连接神经元的突触组成，大脑神经电信号在网络中传递实现信息的处理和分析。二、人工神经网络人工神经网络（简称：神经网络），是一种模仿人类大脑工作
《剖析Transformer架构：自然语言处理飞跃的幕后英雄》人工智能深度学习
在人工智能的迅猛发展进程中，自然语言处理（NLP）领域取得了令人瞩目的突破，而Transformer架构无疑是这场变革的核心驱动力。自从2017年在论文《AttentionIsAllYouNeed》中被提出，Transformer便在NLP领域引发了一场革命，彻底改变了模型处理和理解人类语言的方式。打破传统枷锁，开创并行计算新时代在Transformer出现之前，循环神经网络（RNN）及其变体，如
点云从入门到精通技术详解100篇-基于卷积和注意力机制的3D点云特征提取格图素书 3d
目录知识储备点云获取技术分类一、图像衍生点云二、LiDAR三、RGB-D深度图像传感器基于3D激光slam的点云特征提取为什么要进行点云特征提取特征提取理论与代码编写点云特征提取主体类sample_and_groupfarthest_point_samplequery_ball_pointindex_points前言国内外研究现状卷积神经网络三维卷积神经网络稀疏卷积[21]基于3D点云数据的目标分
深度学习｜表示学习｜卷积神经网络｜由参数共享引出的特征图｜08 漂亮_大男孩表示学习深度学习学习 cnn
如是我闻：FeatureMap（特征图）的概念与ParameterSharing（参数共享）密切相关。换句话说，参数共享是生成FeatureMap的基础。FeatureMap是卷积操作的核心产物，而卷积操作的高效性正是由参数共享带来的。下面我们详细看一下FeatureMap和ParameterSharing之间的关系：1.什么是FeatureMap？定义：FeatureMap是卷积操作生成的输出结
AlphaFold2的思路总结（十五） xiaofengzihhh 蛋白质结构预测深度学习人工智能神经网络
2021SC@SDUSC这学期的代码分析工作接近尾声了，我想简单总结一下AlphaFold2的总体思路具体来看，AlphaFold2主要利用多序列比对（MSA），把蛋白质的结构和生物信息整合到了深度学习算法中。它主要包括两个部分：神经网络EvoFormer和结构模块（Structuremodule）。一、EvoFormer 在EvoFormer中，主要是将图网络（Graphnetworks）
Hadoop(一) 朱辉辉33 hadoop linux
今天在诺基亚第一天开始培训大数据，因为之前没接触过Linux，所以这次一起学了，任务量还是蛮大的。首先下载安装了Xshell软件，然后公司给了账号密码连接上了河南郑州那边的服务器，接下来开始按照给的资料学习，全英文的，头也不讲解，说锻炼我们的学习能力，然后就开始跌跌撞撞的自学。这里写部分已经运行成功的代码吧. 在hdfs下，运行hadoop fs -mkdir /u
maven An error occurred while filtering resources blackproof maven 报错
转：http://stackoverflow.com/questions/18145774/eclipse-an-error-occurred-while-filtering-resources maven报错： maven An error occurred while filtering resources Maven -> Update Proje
jdk常用故障排查命令 daysinsun jvm
linux下常见定位命令： 1、jps 输出Java进程 -q 只输出进程ID的名称，省略主类的名称； -m 输出进程启动时传递给main函数的参数； &nb
java 位移运算与乘法运算周凡杨 java 位移运算乘法
对于 JAVA 编程中，适当的采用位移运算，会减少代码的运行时间，提高项目的运行效率。这个可以从一道面试题说起：问题：用最有效率的方法算出2 乘以8 等於几?” 答案：2 << 3 由此就引发了我的思考，为什么位移运算会比乘法运算更快呢？其实简单的想想，计算机的内存是用由 0 和 1 组成的二
java中的枚举(enmu) g21121 java
从jdk1.5开始，java增加了enum(枚举)这个类型，但是大家在平时运用中还是比较少用到枚举的，而且很多人和我一样对枚举一知半解，下面就跟大家一起学习下enmu枚举。先看一个最简单的枚举类型，一个返回类型的枚举： public enum ResultType { /** * 成功 */ SUCCESS, /** * 失败 */ FAIL,
MQ初级学习 510888780 activemq
1.下载ActiveMQ 去官方网站下载：http://activemq.apache.org/ 2.运行ActiveMQ 解压缩apache-activemq-5.9.0-bin.zip到C盘，然后双击apache-activemq-5.9.0-\bin\activemq-admin.bat运行ActiveMQ程序。启动ActiveMQ以后，登陆：http://localhos
Spring_Transactional_Propagation 布衣凌宇 spring transactional
//事务传播属性 @Transactional(propagation=Propagation.REQUIRED)//如果有事务，那么加入事务，没有的话新创建一个 @Transactional(propagation=Propagation.NOT_SUPPORTED)//这个方法不开启事务 @Transactional(propagation=Propagation.REQUIREDS_N
我的spring学习笔记12-idref与ref的区别 aijuans spring
idref用来将容器内其他bean的id传给<constructor-arg>/<property>元素，同时提供错误验证功能。例如： <bean id ="theTargetBean" class="..." /> <bean id ="theClientBean" class=&quo
Jqplot之折线图 antlove js jquery Web timeseries jqplot
timeseriesChart.html <script type="text/javascript" src="jslib/jquery.min.js"></script> <script type="text/javascript" src="jslib/excanvas.min.js&
JDBC中事务处理应用百合不是茶 java JDBC编程事务控制语句
解释事务的概念; 事务控制是sql语句中的核心之一;事务控制的作用就是保证数据的正常执行与异常之后可以恢复事务常用命令: Commit提交
[转]ConcurrentHashMap Collections.synchronizedMap和Hashtable讨论 bijian1013 java 多线程线程安全 HashMap
在Java类库中出现的第一个关联的集合类是Hashtable，它是JDK1.0的一部分。 Hashtable提供了一种易于使用的、线程安全的、关联的map功能，这当然也是方便的。然而，线程安全性是凭代价换来的――Hashtable的所有方法都是同步的。此时，无竞争的同步会导致可观的性能代价。Hashtable的后继者HashMap是作为JDK1.2中的集合框架的一部分出现的，它通过提供一个不同步的
ng-if与ng-show、ng-hide指令的区别和注意事项 bijian1013 JavaScript AngularJS
angularJS中的ng-show、ng-hide、ng-if指令都可以用来控制dom元素的显示或隐藏。ng-show和ng-hide根据所给表达式的值来显示或隐藏HTML元素。当赋值给ng-show指令的值为false时元素会被隐藏，值为true时元素会显示。ng-hide功能类似，使用方式相反。元素的显示或
【持久化框架MyBatis3七】MyBatis3定义typeHandler bit1129 TypeHandler
什么是typeHandler? typeHandler用于将某个类型的数据映射到表的某一列上，以完成MyBatis列跟某个属性的映射内置typeHandler MyBatis内置了很多typeHandler，这写typeHandler通过org.apache.ibatis.type.TypeHandlerRegistry进行注册，比如对于日期型数据的typeHandler，
上传下载文件rz,sz命令 bitcarter linux命令rz
刚开始使用rz上传和sz下载命令：因为我们是通过secureCRT终端工具进行使用的所以会有上传下载这样的需求：我遇到的问题： sz下载A文件10M左右，没有问题但是将这个文件A再传到另一天服务器上时就出现传不上去，甚至出现乱码，死掉现象，具体问题解决方法：上传命令改为;rz -ybe 下载命令改为：sz -be filename 如果还是有问题：那就是文
通过ngx-lua来统计nginx上的虚拟主机性能数据 ronin47 ngx-lua　统计解禁ip
介绍以前我们为nginx做统计,都是通过对日志的分析来完成.比较麻烦,现在基于ngx_lua插件,开发了实时统计站点状态的脚本,解放生产力.项目主页: https://github.com/skyeydemon/ngx-lua-stats 功能支持分不同虚拟主机统计, 同一个虚拟主机下可以分不同的location统计. 可以统计与query-times request-time
java-68-把数组排成最小的数。一个正整数数组，将它们连接起来排成一个数，输出能排出的所有数字中最小的。例如输入数组{32, 321}，则输出32132 bylijinnan java
import java.util.Arrays; import java.util.Comparator; public class MinNumFromIntArray { /** * Q68输入一个正整数数组，将它们连接起来排成一个数，输出能排出的所有数字中最小的一个。 * 例如输入数组{32, 321}，则输出这两个能排成的最小数字32132。请给出解决问题
Oracle基本操作 ccii Oracle SQL总结 Oracle SQL语法 Oracle基本操作 Oracle SQL
一、表操作 1. 常用数据类型 NUMBER(p,s)：可变长度的数字。p表示整数加小数的最大位数，s为最大小数位数。支持最大精度为38位 NVARCHAR2(size)：变长字符串，最大长度为4000字节（以字符数为单位） VARCHAR2(size)：变长字符串，最大长度为4000字节（以字节数为单位） CHAR(size)：定长字符串，最大长度为2000字节，最小为1字节，默认
[强人工智能]实现强人工智能的路线图 comsci 人工智能
1：创建一个用于记录拓扑网络连接的矩阵数据表 2:自动构造或者人工复制一个包含10万个连接(1000*1000)的流程图 3：将这个流程图导入到矩阵数据表中 4：在矩阵的每个有意义的节点中嵌入一段简单的
给Tomcat，Apache配置gzip压缩(HTTP压缩)功能 cwqcwqmax9 apache
背景： HTTP 压缩可以大大提高浏览网站的速度，它的原理是，在客户端请求网页后，从服务器端将网页文件压缩，再下载到客户端，由客户端的浏览器负责解压缩并浏览。相对于普通的浏览过程HTML ,CSS,Javascript , Text ，它可以节省40%左右的流量。更为重要的是，它可以对动态生成的，包括CGI、PHP , JSP , ASP , Servlet,SHTML等输出的网页也能进行压缩，
SpringMVC and Struts2 dashuaifu struts2 springMVC
SpringMVC VS Struts2 1: spring3开发效率高于struts 2: spring3 mvc可以认为已经100%零配置 3: struts2是类级别的拦截，一个类对应一个request上下文， springmvc是方法级别的拦截，一个方法对应一个request上下文，而方法同时又跟一个url对应所以说从架构本身上 spring3 mvc就容易实现r
windows常用命令行命令 dcj3sjt126com windows cmd command
在windows系统中，点击开始－运行，可以直接输入命令行，快速打开一些原本需要多次点击图标才能打开的界面，如常用的输入cmd打开dos命令行，输入taskmgr打开任务管理器。此处列出了网上搜集到的一些常用命令。winver 检查windows版本 wmimgmt.msc 打开windows管理体系结构(wmi) wupdmgr windows更新程序 wscrip
再看知名应用背后的第三方开源项目 dcj3sjt126com ios
知名应用程序的设计和技术一直都是开发者需要学习的，同样这些应用所使用的开源框架也是不可忽视的一部分。此前《 iOS第三方开源库的吐槽和备忘》中作者ibireme列举了国内多款知名应用所使用的开源框架，并对其中一些框架进行了分析，同样国外开发者 @iOSCowboy也在博客中给我们列出了国外多款知名应用使用的开源框架。另外txx's blog中详细介绍了 Facebook Paper使用的第三
Objective-c单例模式的正确写法 jsntghf 单例 ios iPhone
一般情况下，可能我们写的单例模式是这样的： #import <Foundation/Foundation.h> @interface Downloader : NSObject + (instancetype)sharedDownloader; @end #import "Downloader.h" @implementation
jquery easyui datagrid 加载成功，选中某一行 hae jquery easyui datagrid 数据加载
1.首先你需要设置datagrid的onLoadSuccess $( '#dg' ).datagrid({onLoadSuccess : function (data){ $( '#dg' ).datagrid( 'selectRow' ,3); }}); 2.onL
jQuery用户数字打分评价效果 ini JavaScript html jquery Web css
效果体验：http://hovertree.com/texiao/jquery/5.htmHTML文件代码： <!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>jQuery用户数字打分评分代码 - HoverTree</
mybatis的paramType kerryg DAO sql
MyBatis传多个参数： 1、采用#{0},#{1}获得参数： Dao层函数方法： public User selectUser(String name,String area); 对应的Mapper.xml <select id="selectUser" result
centos 7安装mysql5.5 MrLee23 centos
首先centos7 已经不支持mysql，因为收费了你懂得，所以内部集成了mariadb，而安装mysql的话会和mariadb的文件冲突，所以需要先卸载掉mariadb，以下为卸载mariadb，安装mysql的步骤。 #列出所有被安装的rpm package rpm -qa | grep mariadb #卸载 rpm -e mariadb-libs-5.
利用thrift来实现消息群发 qifeifei thrift
Thrift项目一般用来做内部项目接偶用的，还有能跨不同语言的功能，非常方便，一般前端系统和后台server线上都是3个节点，然后前端通过获取client来访问后台server，那么如果是多太server，就是有一个负载均衡的方法，然后最后访问其中一个节点。那么换个思路，能不能发送给所有节点的server呢，如果能就
实现一个sizeof获取Java对象大小 teasp java HotSpot 内存对象大小 sizeof
由于Java的设计者不想让程序员管理和了解内存的使用，我们想要知道一个对象在内存中的大小变得比较困难了。本文提供了可以获取对象的大小的方法，但是由于各个虚拟机在内存使用上可能存在不同，因此该方法不能在各虚拟机上都适用，而是仅在hotspot 32位虚拟机上，或者其它内存管理方式与hotspot 32位虚拟机相同的虚拟机上适用。
SVN错误及处理 xiangqian0505 SVN提交文件时服务器强行关闭
在SVN服务控制台打开资源库“SVN无法读取current” ---摘自网络写道 SVN无法读取current修复方法 Can't read file : End of file found 文件：repository/db/txn_current、repository/db/current 其中current记录当前最新版本号，txn_current记录版本库中版本