特征金字塔FPN

论文:feature pyramid networks for object detection
论文链接:https://arxiv.org/abs/1612.03144


1 提出问题

首先介绍FPN要解决什么问题:
1,目标检测中的多尺度问题,即一个网络既能够检测到大目标,也能够检测到小目标。
2,现有的解决方案采用多尺度训练和预测,这种方法太耗时而且内存占用高,不能在实际中应用。
3,SSD 直接用不同stride 的feature map 作预测,但是这样的话浅层的语义信息不够丰富,深层的位置信息丢失。

2 FPN网络架构

特征金字塔FPN_第1张图片

a) 多尺度训练和预测,太耗时,很难实际应用。
b) 只在网络末端检测,浅层位置信息丢失,小物体检测效果差。
c)SSD 网络
d)FPN 网络,浅层feature map 与深层feature map 信息融合,使浅层的feature map也有丰富的语义信息,有利于小目标的检测。

特征金字塔FPN_第2张图片
A building block illustrating the lateral connection and the top-down pathway, merged by addition.

具体融合方法见上图,浅层特征图经过1*1 卷积,深层特征经过2倍上采样,将两者作element wise addition, 即对应像素直接相加。

3 ROI Pooling

3.1 Feature Pyramid Networks for RPN

Formally, we define the anchors to have areas of {32, 64
, 128, 256, 512} pixels on {P2, P3, P4, P5, P6} respectively.
We also use anchors of multiple aspect ratios {1:2, 1:1, 2:1} at each level. So in total there are 15 anchors over the pyramid.

3.2 Feature Pyramid Networks for Fast R-CNN

Fast R-CNN is most commonly performed on a
single-scale feature map. To use it with our FPN, we need
to assign RoIs of different scales to the pyramid levels.
Formally, we assign an RoI of width w and height h (on the input image to the network) to the level Pk of our feature pyramid by:


Here 224 is the canonical ImageNet pre-training size, and
k0 is the target level on which an RoI with w × h = 224
should be mapped into. Analogous to the ResNet-based
Faster R-CNN system that uses C4 as the single-scale
feature map, we set k0 to 4.
If the RoI’s scale becomes smaller (say, 1/2 of 224), it
should be mapped into a finer-resolution level (say, k = 3).

4 实验验证

特征金字塔FPN_第3张图片

可以看到不管是在RPN 网络上还是fast-rcnn ,faster-rcnn 网络上,加上FPN 后,AP值都有较大提升,特别是小目标的APs, 提高很大。

你可能感兴趣的:(特征金字塔FPN)