最近在使用Faster RCNN进行目标检测,因为自己的数据样本目标较小,原始的scale下训练结果不够,所以想着修改anchor的proposal大小来提高精度,这里记录下来。以下是Faster RCNN的论文和源码:
Faster RCNN论文:点这里
Faster RCNN源码Github地址:点这里
需要修改的总共有5个文件,分别修改如下:
1.源目录/lib/rpn/proposal_layer.py文件
def setup(self, bottom, top):
# parse the layer parameter string, which must be valid YAML
layer_params = yaml.load(self.param_str_)
self._feat_stride = layer_params['feat_stride']
anchor_scales = layer_params.get('scales', (8, 16, 32))
self._anchors = generate_anchors(scales=np.array(anchor_scales))
self._num_anchors = self._anchors.shape[0]
if DEBUG:
print 'feat_stride: {}'.format(self._feat_stride)
print 'anchors:'
print self._anchors
# rois blob: holds R regions of interest, each is a 5-tuple
# (n, x1, y1, x2, y2) specifying an image batch index n and a
# rectangle (x1, y1, x2, y2)
top[0].reshape(1, 5)
# scores blob: holds scores for R regions of interest
if len(top) > 1:
top[1].reshape(1, 1, 1, 1)
大概在29行,
anchor_scales = layer_params.get('scales', (8, 16, 32))
这一句中括号里的数字就是proposal的anchor的大小标准,按照自己的需要修改,我这里修改如下:
anchor_scales = layer_params.get('scales', (2, 4, 8, 16, 32))
2.源目录/lib/rpn/anchor_target_layer.py文件
def setup(self, bottom, top):
layer_params = yaml.load(self.param_str_)
anchor_scales = layer_params.get('scales', (8, 16, 32))
self._anchors = generate_anchors(scales=np.array(anchor_scales))
self._num_anchors = self._anchors.shape[0]
self._feat_stride = layer_params['feat_stride']
if DEBUG:
print 'anchors:'
print self._anchors
print 'anchor shapes:'
print np.hstack((
self._anchors[:, 2::4] - self._anchors[:, 0::4],
self._anchors[:, 3::4] - self._anchors[:, 1::4],
))
self._counts = cfg.EPS
self._sums = np.zeros((1, 4))
self._squared_sums = np.zeros((1, 4))
self._fg_sum = 0
self._bg_sum = 0
self._count = 0
# allow boxes to sit over the edge by a small amount
self._allowed_border = layer_params.get('allowed_border', 0)
height, width = bottom[0].data.shape[-2:]
if DEBUG:
print 'AnchorTargetLayer: height', height, 'width', width
A = self._num_anchors
# labels
top[0].reshape(1, 1, A * height, width)
# bbox_targets
top[1].reshape(1, A * 4, height, width)
# bbox_inside_weights
top[2].reshape(1, A * 4, height, width)
# bbox_outside_weights
top[3].reshape(1, A * 4, height, width)
大概在27行,
anchor_scales = layer_params.get('scales', (2,4,8, 16, 32))
这一句和上一句一样,修改也一样:
anchor_scales = layer_params.get('scales', (2, 4, 8, 16, 32))
3.源目录/models/pascal_voc/VGG16/faster_rcnn_end2end/train.prototxt
注意:这里的VGG16是指自己要训练采用的网络模型,根据自己的需要变动,可以是ZF、ResNet之类的。
rpn_cls_score层:
layer {
name: "rpn_cls_score"
type: "Convolution"
bottom: "rpn/output"
top: "rpn_cls_score"
param { lr_mult: 1.0 }
param { lr_mult: 2.0 }
convolution_param {
num_output:18 # 2(bg/fg) * 9(anchors)
kernel_size: 1 pad: 0 stride: 1
weight_filler { type: "gaussian" std: 0.01 }
bias_filler { type: "constant" value: 0 }
}
}
这里的num_output是2*anchors的个数,原来是9(3个anchor scale*3个anchor ratio),现在改成对应的数字,我这里anchor的个数是3*5,所以num_output改成30。
num_output:18 # 2(bg/fg) * 9(anchors)
改成
num_output:30 # 2(bg/fg) * 15(anchors)
rpn_bbox_pred层:
layer {
name: "rpn_bbox_pred"
type: "Convolution"
bottom: "rpn/output"
top: "rpn_bbox_pred"
param { lr_mult: 1.0 }
param { lr_mult: 2.0 }
convolution_param {
num_output: 36 # 4 * 9(anchors)
kernel_size: 1 pad: 0 stride: 1
weight_filler { type: "gaussian" std: 0.01 }
bias_filler { type: "constant" value: 0 }
}
}
这里的num_output是每个anchor四个角点坐标个数,同样按之前的方法修改:
num_output: 36 # 4 * 9(anchors)
修改成:
num_output: 60 # 4 * 15(anchors)
rpn_cls_prob_reshape层:
layer {
name: 'rpn_cls_prob_reshape'
type: 'Reshape'
bottom: 'rpn_cls_prob'
top: 'rpn_cls_prob_reshape'
reshape_param { shape { dim: 0 dim: 18 dim: -1 dim: 0 } }
}
这一行
reshape_param { shape { dim: 0 dim: 18 dim: -1 dim: 0 } }
中的18修改成2*anchor个数,这里是30:
reshape_param { shape { dim: 0 dim: 30 dim: -1 dim: 0 } }
4.源目录/models/pascal_voc/VGG16/faster_rcnn_end2end/test.prototxt
这里修改和train.prototxt一样
5.源目录/models/pascal_voc/VGG16/faster_rcnn_alt_opt/faster_rcnn_test.pt
虽然是end2end模式,但是还是得修改这个目录下的这个文件,不然test的时候会出现错误。
修改方法很简单,直接把刚才修改的4(test.prototxt)复制过来,文件名改成faster_rcnn_test.pt就行了。
以上就是要修改的文件内容,需要注意的是这里的scale大小是指图像resize之后并经过stride=16倍的池化缩小后的大小,所以在resize后的图像应该是(128,256,512)的anchor大小,根据自己的实际情况改。