神经网络实现---SSD

最详细的目标检测SSD算法讲解
深度学习之目标检测 SSD的理解和细节分析
SSD算法—-模型结构的详解及Python源码分析
SSD框架详细解读(一)

关于SSD的实现,参考了https://github.com/balancap/SDC-Vehicle-Detection,其中阐述了实现的细节。
the SSD network used the concept of anchor boxes for object detection. The image below illustrates the concept: at several scales are pre-defined boxes with different sizes and ratios. The goal of SSD convolutional network is, for each of these anchor boxes, to detect if there is an object inside this box (or closely), and compute the offset between the object bounding box and the fixed anchor box.
神经网络实现---SSD_第1张图片
In the case of SSD network, we use VGG16 as a based architecture: it provides high quality features at different scales, the former being then used as inputs for multibox modules in charge of computing the object type and coordinates for each anchor boxes. The architecture of the network we use is illustrated in the following TensorBoard graph. It follows the original SSD paper:

  • Convolutional Blocks 1 to 7 are exactly VGG modules. Hence, these weights can be imported from VGG weights, speeding massively training time;
  • Blocks 8 to 11 are additional feature blocks. They consist of two convolutional layers each: a 3x3 convolution followed by a 1x1 convolution;
  • Yellow blocks are multibox modules: they take VGG-type features as inputs, and outputs two components: a softmax Tensor which gives for every anchor box the probability of an object being detected, and an offset Tensor which describes the offset between the object bounding box and the anchor box. These two Tensors are the results of two different 3x3 convolutions of the input Tensor.

For instance, consider the 8x8 feature block described in the image above. At every coordinate in the grid, it defines 4 anchor boxes of different dimensions. The multibox module taking this feature Tensor as input will thus provide two output Tensors: a classification Tensor of shape 8x8x4xNClasses and an offset Tensor of shape 8x8x4x4, where in the latter, the last dimension stands for the 4 coordinates of every bounding box.

As a result, the global SSD network will provide a classification score and an offset for a total of 8732 anchor boxes. During training, we therefore try to minimize both errors: the classification error on every anchor box and the localization error when there is a positive match with a grountruth bounding box. We refer to the original SSD paper for the precise equations defining the loss function.
神经网络实现---SSD_第2张图片

SSD模型结构与YOLO对比

完整的模型图请点击:
https://github.com/bjzhao143/objectiveDetect/blob/master/models/ssd300_model.png

神经网络实现---SSD_第3张图片
神经网络实现---SSD_第4张图片

anchors计算示意图

神经网络实现---SSD_第5张图片
神经网络实现---SSD_第6张图片

预训练的SSD-tesnforflow模型

本人下载了https://github.com/HiKapok/SSD.TensorFlow中的代码和预训练模型,用于细粒度识别。

附录:VGG16与VGG19的模块

首先解释一下图中的结构,ABCDE分别为当时VGG项目组测试的不同的网络结构,对于不同的结构进行了效果上的比较,从中发现LRN(local response normalization)好像并没什么用,之后就在后面的结构中舍弃了。图中D和E分别为VGG16和VGG19
神经网络实现---SSD_第7张图片
VGG19中的“19”是怎么来的?
神经网络实现---SSD_第8张图片

在模型文件中,VGG19把激活层也当作一层,因此具有43个layers:

#43 layers
VGG19_LAYERS = ('conv1_1', 'relu1_1', 'conv1_2', 'relu1_2', 'pool1',
        'conv2_1', 'relu2_1', 'conv2_2', 'relu2_2', 'pool2',
        'conv3_1', 'relu3_1', 'conv3_2', 'relu3_2', 'conv3_3', 'relu3_3', 'conv3_4', 'relu3_4', 'pool3',
        'conv4_1', 'relu4_1', 'conv4_2', 'relu4_2', 'conv4_3', 'relu4_3', 'conv4_4', 'relu4_4', 'pool4',
        'conv5_1', 'relu5_1', 'conv5_2', 'relu5_2', 'conv5_3', 'relu5_3', 'conv5_4', 'relu5_4', 'pool5',
        'fc6', 'relu6',
        'fc7', 'relu7',
        'fc8', 'softmax',
        )

你可能感兴趣的:(图像处理,深度学习)