DSSD

titile	DSSD : Deconvolutional Single Shot Detector
url	https://arxiv.org/pdf/1701.06659.pdf
动机	增加context的信息提高目标检测准确率，改进SSD
内容	DSSD： 1、Residual-101 + SSD(不是VGG)，增强了特征提取能力。 2、deconvolution layers增加context(hourglass)，提高精度(小目标识别的好) 3、513 × 513 input achieves 81.5% mAP on VOC2007 test, 80.0% mAP on VOC2012 test, and 33.2% mAP on COCO，优于R-FCN 4、思想不易实现，重点是反卷积中前馈连接模块和新输出模块目标检测improve accuracy方法： 1、better feature network。 2、more context。 3、提高spatial resolution of the bounding box prediction process。 Using Residual-101 in place of VGG： 1、目的：improve accuracy 2、效果：精度下降，a mAP of 76.4 of SSD with Residual-101 on 321 × 321 inputs for PASCAL VOC2007 test. This is lower than the 77.5 for SSD with VGG on 300 × 300 inputs Prediction module： 1、SSD做法：直接用feature maps预测，conv4_3优于导数量级大加入L2 normalization layer。 2、MS-CNN：改进每个预测分支的sub-network可以增加accuracy。(该网络用deconvolution增加multiple卷积层分辨率，小目标用浅层预测，但含有semantic少) 3、DSSD：add one residual block for each prediction layer 。如图2(c)(借鉴MS-CNN想法) (a) original SSD approach (b) the residual block with a skip connection (c) two sequential residual blocks均不如(c)效果好效果：对于高分辨率图片，Residual-101 and the prediction module比VGG without the prediction module好。 Deconvolutional SSD： 1、非对称hourglass网络(deconvolution)得到更多context。 2、Extra deconvolution layers增加分辨率 3、hourglass ”skip connection”加强特征。 4、非对称hourglass原因(decoder层少)：（1）速度快（2）缺少decoder-stage的预训练模型，从头训练计算成本高。(adding information from the previous layers and the deconvolutional process计算成本高) Deconvolution Module： 1、目的：整合浅层feature maps和e deconvolution layers的信息。(图3部分代表图1中实心圆圈部分) 2、想法来源：deconvolution module for a refinement network 与复杂网络有相同精度，并且更efficient。 3、做法：（1）每个卷积层后batch normalization layer （2） learned deconvolution layer instead of bilinear upsampling （3） element-wise product比element-wise sum好 Training：(同SSD类似) 1、match box：gt box与重叠率最高的default box和重叠率高于0.5的均视为正样本。选择负样本保证正负样本比为3:1。 2、loss：Smooth L1 + Softmax。 3、data augmentation：randomly cropping、photometric distortion、 random flipping 、random expansion (有助于小目标检测)。 4、K-means clustering计算anchor，square root of box area as the feature，聚7类(从2类开始试，如果增加一个类可以提升20%就增加)，aspect ratio (1.6, 2.0, 3.0)(增加了1.6)。
实验	Base network： 1、预测层的原则要保证与VGG网络感受野大致对应。 2、conv5 stage’s effective stride from 32 pixels to 16 pixels增加分辨率 conv5 stage第一个卷积层stride=2变为stride=1，conv5 stage所有kernel size大于1的卷积层使用a trous ` algorithm， dilation从1增加到2弥补reduced stride，使得可以使用与训练模型。 3、Residual blocks增加一些extra layers降低feature map尺寸。 PASCAL VOC 2007： 1、 original SSD model(作为SSD预训练模型)：batch size=32(321 × 321)，20(513 × 513)；learning rate：(40k iterations)，(at 60k iterations)，(at 70k iterations) 2、DSSD两个阶段：（1）freezing SSD部分，训练extra deconvolution，(20k iterations)，(10k iterations) （2）fine-tune所有网络，(20k iterations)，(20k iterations) Ablation Study on VOC2007： PASCAL VOC 2012： COCO： 1、a batch size smaller than 16 and trained on 4 GPUs can cause unstable results in batch normalization and hurt accuracy.(Residule-101有 batch normalization) 2、Residule-101替换VGG，大目标识别效果变好，DSSD加入其他的convolution layers，小目标识别效果变好。 3、输入图片尺寸大，精度高，训练和测试时间长。 Inference Time： 1、为了加速，测试时合并BN层到conv层。速度提升1.2 - 1.5倍，memory减少3倍。公式1为BN，公式2、3、4为合并后。（1）公式1-BN：conv层输出减均值，除方差平方根(标准差)，通过训练参数scaling and shifting。（2）公式2-weight、公式3-bias、公式4-conv。 2、速度变慢：（1）Residual-101 network比VGGNet层数多（2）增加一些额外的层(尤其prediction module、deconvolutional module)， bilinear up-sampling替换deconvolution layer会加速（3）更多 default box( prediction and non-maximum suppression)，比SSD box 多2.6倍 Visualization：与SSD相比：（1）小目标、密集目标提升（2）明显上下文信息的类提升，如baseball bat and baseball player
思考	小目标、dense objects检测变好，速度变慢

DSSD

你可能感兴趣的:(DSSD)