caffe下py-Faster RCNN end2end模式修改anchor的scale大小

caffe下py-Faster RCNN end2end模式修改anchor的scale大小

最近在使用Faster RCNN进行目标检测,因为自己的数据样本目标较小,原始的scale下训练结果不够,所以想着修改anchor的proposal大小来提高精度,这里记录下来。以下是Faster RCNN的论文和源码:
Faster RCNN论文:点这里
Faster RCNN源码Github地址:点这里

需要修改的总共有5个文件,分别修改如下:
1.源目录/lib/rpn/proposal_layer.py文件

def setup(self, bottom, top):
        # parse the layer parameter string, which must be valid YAML
        layer_params = yaml.load(self.param_str_)

        self._feat_stride = layer_params['feat_stride']
        anchor_scales = layer_params.get('scales', (8, 16, 32))
        self._anchors = generate_anchors(scales=np.array(anchor_scales))
        self._num_anchors = self._anchors.shape[0]

        if DEBUG:
            print 'feat_stride: {}'.format(self._feat_stride)
            print 'anchors:'
            print self._anchors

        # rois blob: holds R regions of interest, each is a 5-tuple
        # (n, x1, y1, x2, y2) specifying an image batch index n and a
        # rectangle (x1, y1, x2, y2)
        top[0].reshape(1, 5)

        # scores blob: holds scores for R regions of interest
        if len(top) > 1:
            top[1].reshape(1, 1, 1, 1)

大概在29行,

anchor_scales = layer_params.get('scales', (8, 16, 32))

这一句中括号里的数字就是proposal的anchor的大小标准,按照自己的需要修改,我这里修改如下:

anchor_scales = layer_params.get('scales', (2, 4, 8, 16, 32))

2.源目录/lib/rpn/anchor_target_layer.py文件

def setup(self, bottom, top):
        layer_params = yaml.load(self.param_str_)
        anchor_scales = layer_params.get('scales', (8, 16, 32))
        self._anchors = generate_anchors(scales=np.array(anchor_scales))
        self._num_anchors = self._anchors.shape[0]
        self._feat_stride = layer_params['feat_stride']

        if DEBUG:
            print 'anchors:'
            print self._anchors
            print 'anchor shapes:'
            print np.hstack((
                self._anchors[:, 2::4] - self._anchors[:, 0::4],
                self._anchors[:, 3::4] - self._anchors[:, 1::4],
            ))
            self._counts = cfg.EPS
            self._sums = np.zeros((1, 4))
            self._squared_sums = np.zeros((1, 4))
            self._fg_sum = 0
            self._bg_sum = 0
            self._count = 0

        # allow boxes to sit over the edge by a small amount
        self._allowed_border = layer_params.get('allowed_border', 0)

        height, width = bottom[0].data.shape[-2:]
        if DEBUG:
            print 'AnchorTargetLayer: height', height, 'width', width

        A = self._num_anchors
        # labels
        top[0].reshape(1, 1, A * height, width)
        # bbox_targets
        top[1].reshape(1, A * 4, height, width)
        # bbox_inside_weights
        top[2].reshape(1, A * 4, height, width)
        # bbox_outside_weights
        top[3].reshape(1, A * 4, height, width)

大概在27行,

anchor_scales = layer_params.get('scales', (2,4,8, 16, 32))

这一句和上一句一样,修改也一样:

anchor_scales = layer_params.get('scales', (2, 4, 8, 16, 32))

3.源目录/models/pascal_voc/VGG16/faster_rcnn_end2end/train.prototxt

注意:这里的VGG16是指自己要训练采用的网络模型,根据自己的需要变动,可以是ZF、ResNet之类的。

rpn_cls_score层

layer {
  name: "rpn_cls_score"
  type: "Convolution"
  bottom: "rpn/output"
  top: "rpn_cls_score"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  convolution_param {
    num_output:18   # 2(bg/fg) * 9(anchors)
    kernel_size: 1 pad: 0 stride: 1
    weight_filler { type: "gaussian" std: 0.01 }
    bias_filler { type: "constant" value: 0 }
  }
}

这里的num_output是2*anchors的个数,原来是9(3个anchor scale*3个anchor ratio),现在改成对应的数字,我这里anchor的个数是3*5,所以num_output改成30。

num_output:18   # 2(bg/fg) * 9(anchors)

改成

num_output:30   # 2(bg/fg) * 15(anchors)

rpn_bbox_pred层:

layer {
  name: "rpn_bbox_pred"
  type: "Convolution"
  bottom: "rpn/output"
  top: "rpn_bbox_pred"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  convolution_param {
    num_output: 36   # 4 * 9(anchors)
    kernel_size: 1 pad: 0 stride: 1
    weight_filler { type: "gaussian" std: 0.01 }
    bias_filler { type: "constant" value: 0 }
  }
}

这里的num_output是每个anchor四个角点坐标个数,同样按之前的方法修改:

num_output: 36   # 4 * 9(anchors)

修改成:

num_output: 60   # 4 * 15(anchors)

rpn_cls_prob_reshape层:

layer {
  name: 'rpn_cls_prob_reshape'
  type: 'Reshape'
  bottom: 'rpn_cls_prob'
  top: 'rpn_cls_prob_reshape'
  reshape_param { shape { dim: 0 dim: 18 dim: -1 dim: 0 } }
}

这一行

reshape_param { shape { dim: 0 dim: 18 dim: -1 dim: 0 } }

中的18修改成2*anchor个数,这里是30:

reshape_param { shape { dim: 0 dim: 30 dim: -1 dim: 0 } }

4.源目录/models/pascal_voc/VGG16/faster_rcnn_end2end/test.prototxt
这里修改和train.prototxt一样

5.源目录/models/pascal_voc/VGG16/faster_rcnn_alt_opt/faster_rcnn_test.pt
虽然是end2end模式,但是还是得修改这个目录下的这个文件,不然test的时候会出现错误。
修改方法很简单,直接把刚才修改的4(test.prototxt)复制过来,文件名改成faster_rcnn_test.pt就行了。

以上就是要修改的文件内容,需要注意的是这里的scale大小是指图像resize之后并经过stride=16倍的池化缩小后的大小,所以在resize后的图像应该是(128,256,512)的anchor大小,根据自己的实际情况改。

你可能感兴趣的:(Caffe,Python,目标检测,深度学习)