Fast R-CNN 论文 笔记 及 源码解读

Fast R-CNN

与RCNN SPPnet对比

  1. RCNN首先finetune,使用log loss。然后,使用SVMs来训练,最后,使用bounding-box regressor。
  2. 代价大

Fast R-CNN 模型结构和训练

Fast R-CNN 论文 笔记 及 源码解读_第1张图片

一张图片首先经过几个卷积层和池化层产生特征向量,然后 for each object proposal a region of interest(RoI) pooling layer extracts a fixed-length feature vector from the feature map.

然后输入一组fully connected层,最终 branch into two sibling output layers:
1. one that produces softmax probability estimates over K object classes plus a catch-all “background” class
2. another layer that outputs four real-valued numbers for each of the K object classes. Each set of 4 values encodes refined bounding-box positions for one of the K classes.

The RoI pooling layer

uses max pooling to convert the features inside any valid region of interest into a small feature map with a fixed spatial extent of H * W
Each RoI is defined by a four-tuple (r, c, h, w) that specifies its top-left corner(r, c) and its height and width(h, w).

max pooling h/H * w/W size window

Initializing from pre-trained networks

use a pre-trained network initializes a Fast R-CNN network, it undergoes three transformations
1. last max pooling layer is replaced by a RoI pooling layer
2. last fc layer and soft-max replaced with two sibling layers(a fully connected layer and softmax over K +1 categories and category-specific bounding-box regressors)
3. The network is modified to take two data inputs: a list of images and a list of RiOs in those images.

Fine-tuning for detection

In Fast R-CNN training, SGD mini-batches are sampled hierarchically, first by sampling N images and then by sampling R/N RoIs from each image.

* Multi-task loss *
Fast R-CNN 论文 笔记 及 源码解读_第2张图片

Fast R-CNN 论文 笔记 及 源码解读_第3张图片

未完

你可能感兴趣的:(深度学习)