[深度学习论文笔记][Object Detection] Faster R-CNN: Towards Real-Time Object

Ren, Shaoqing, et al. “Faster R-CNN: Towards real-time object detection with region proposal networks.” Advances in neural information processing systems. 2015. (Citations:
444).


1 Motivation

Region proposals are the test-time computational bottleneck in state-of-the-art detection systems.


We solve this issue by inserting a Region Proposal Network (RPN) after the conv5 layer to produce region proposals directly. Thus, there is no need for external region proposals. After RPN, use RoI pooling and an upstream classifier and bounding box regressor just like Fast R-CNN. See Fig.

[深度学习论文笔记][Object Detection] Faster R-CNN: Towards Real-Time Object_第1张图片


2 Region Proposal Network

Slide a small window (3 × 3 in our case) on the conv5 feature map. For each window, we simultaneously predict multiple region proposals with a wide range of scales and aspect ratios, where the number of maximum possible proposals for each location is denoted as N.


We generate region proposals by building a two-layer network. See Fig. The classification head outputs 2N scores that estimate probability of object or not object for each

proposal. The regression head has 4N outputs encoding the offsets to N reference boxes, which we call anchors. By default we use 3 scales and 3 aspect ratios, yielding N = 9 anchors at each sliding position. For a conv5 feature map of a size H × W, there are HWN anchors in total.

[深度学习论文笔记][Object Detection] Faster R-CNN: Towards Real-Time Object_第2张图片


Anchors are translation invariant, both in terms of the anchors and the functions that compute proposals relative to the anchors. If one translates an object in an image, the
proposal should translate and the same function should be able to predict the proposal in either location.

3 Training Details
We assign a positive label to two kinds of anchors:
• The anchor with the highest IoU overlap with a ground-truth box.
• The anchor that has an IoU overlap higher than 0.7 with any ground-truth box.
We adopt the first condition for the reason that in some rare cases the second condition may find no positive sample. a


We assign a negative label to a non-positive anchor if its IoU ratio is lower than 0.3 for all ground-truth boxes. Anchors that are neither positive nor negative do not contribute to the training objective.


We joint training the whole network, it has four losses. 
• RPN classification (anchor good/bad).
• RPN regression (anchor → proposal).
• Fast R-CNN classification (over classes).

• Fast R-CNN regression (proposal → box).

4 Result
See Tab.

[深度学习论文笔记][Object Detection] Faster R-CNN: Towards Real-Time Object_第3张图片

你可能感兴趣的:(CNN,Papers)