目标检测--PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection

https://www.arxiv.org/abs/1608.08021

Demo code: https://github.com/sanghoon/pva-faster-rcnn

本文针对多种类目标检测这个问题,结合当前各种最新技术成果,达到很好的结果。

We obtained solid results on well-known object detection benchmarks: 81.8% mAP (mean average precision) on VOC2007 and 82.5% mAP on VOC2012 (2nd place), while taking only 750ms/image on Intel i7-6700K CPU with a single core and 46ms/image on NVIDIA Titan X GPU. Theoretically, our network requires only 12.3% of the computational cost compared to ResNet-101, the winner on VOC2012

针对整体检测框架:CNN feature extraction + region proposal + RoI classification
我们主要优化 feature extraction,因为 region proposal part 速度比较快,不占用什么时间。分类部分可以通过 SVD 进行有效压缩模型复杂度。 我们的设计原则是: 少点特征种类,多点层数。less channels with more layers。 设计网络采用了 concatenated ReLU, Inception, and HyperNet,训练时采用 batch normalization, residual connections, and learning rate scheduling based on plateau detection。

2 Details on Network Design
2.1 C.ReLU: Earlier building blocks in feature generation
目标检测--PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection_第1张图片
C.ReLU 主要用于卷积前几层,降低输出通道一半,然后通过取负得到对应的输出通道,这要提高速度一倍。
C.ReLU reduces the number of output channels by half, and doubles it by simply concatenating the same outputs with negation, which leads to 2x speed-up of the early stage without losing accuracy.

2.2 Inception: Remaining building blocks in feature generation

目标检测--PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection_第2张图片

Inception 对于小目标和大目标都可以很好的解决,主要是通过控制卷积核尺寸来实验的。

2.3 HyperNet: Concatenation of multi-scale intermediate outputs

主要是将不同尺度的卷积特征层结合起来。可以进行多尺度目标检测。

目标检测--PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection_第3张图片
目标检测--PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection_第4张图片

2.4 Deep network training

这里我们在 inception层间加入 residual structures 。 在所有的 ReLU 激活层前加入 Batch normalization 层。 基于 plateau detection 动态控制学习率。

3 Faster R-CNN with our feature extraction network
我们将卷积 3_4层(下采样),卷积 层4_4 卷积层5_4 (上采样)结合为512通道的多尺度输出特征作为 Faster R-CNN模型的输入。
Three intermediate outputs from conv3_4 (with down-scaling), conv4_4, and conv5_4 (with up-scaling) are combined into the 512-channel multi-scale output features

4 Experimental results

目标检测--PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection_第5张图片

目标检测--PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection_第6张图片

你可能感兴趣的:(目标检测,ZJ,深度学习)