计算anchor和ground truth box之间overlap的cython加速方法

在训练RetinaNet的过程中,每一种图片的操作步骤大概可以分成这样几步:
1 获取ground truth的四个顶点坐标 先算出在原图上的比例,然后乘以我们要resize的图片大小比如512,这样便得到,四个顶点坐标。
2 生成anchors,计算N个anchors和该图片中M个gt_boxes的overlap,得到一个(N,M)的矩阵,根据交集的大小,来给每一个anchors分配一个gt_box,并且把这个gt_box的类也分配给那个anchors。最终这一步的输出时(N,5),表示每一个anchor的gt_box坐标和所属的类。

3 进行bbox回归,也就是用bbox_transform函数计算出四个变换参数
4 将这四个变换参数以及类的信息,送入RetinaNet,与RetinaNet的p3到p5作用求出smooth L1 loss和cross entropy loss。

在两块Titan X上训练
对于单张图片,在上述几步中,耗时大概是0.05+1.5+0.7+0.1~~2.4 s
一张图片训练2.4s,那一个1万的数据集,单个epoch就要花6.6个小时。
unacceptable !
所以要提高2,3步骤中计算overlap和计算bbox的效率,尤其是overlap。
先放一下python实现overlap的代码

def compute_overlap(a, b):
    #a [N,4]
    #b [M,4]
    area = (b[:, 2] - b[:, 0] + 1) * (b[:, 3] - b[:, 1] + 1)
    iw = np.minimum(np.expand_dims(a[:, 2], axis=1), b[:, 2]) - np.maximum(np.expand_dims(a[:, 0], axis=1), b[:, 0]) + 1
    ih = np.minimum(np.expand_dims(a[:, 3], axis=1), b[:, 3]) - np.maximum(np.expand_dims(a[:, 1], axis=1), b[:, 1]) + 1
    # 假设a的数目是N,b的数目是M
    # np.expand_dims((N,),axis=1)将(N,)变成(N,1)
    # np.minimum((N,1),(M,)) 得到 (N M) 的矩阵 代表a和b逐一比较的结果
    # 取x和y中较小的值 来计算intersection
    # iw和ih分别是intersection的宽和高 iw和ih的shape都是(N,M), 代表每个anchor和groundTruth之间的intersection
    iw = np.maximum(iw, 0)
    ih = np.maximum(ih, 0) #不允许iw或者ih小于0

    ua = np.expand_dims((a[:, 2] - a[:, 0] + 1) *(a[:, 3] - a[:, 1] + 1), axis=1) + area - iw * ih
    # 并集的计算 S_a+S_b-interection_ab
    ua = np.maximum(ua, np.finfo(float).eps)

    intersection = iw * ih
    return intersection / ua # (N,M)

再看cython的实现

# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Sergey Karayev
# --------------------------------------------------------

cimport cython
import numpy as np
cimport numpy as np


def compute_overlap(
    np.ndarray[double, ndim=2] boxes,
    np.ndarray[double, ndim=2] query_boxes
):
    """
    Args
        a: (N, 4) ndarray of float
        b: (K, 4) ndarray of float

    Returns
        overlaps: (N, K) ndarray of overlap between boxes and query_boxes
    """
    cdef unsigned int N = boxes.shape[0]
    cdef unsigned int K = query_boxes.shape[0]
    cdef np.ndarray[double, ndim=2] overlaps = np.zeros((N, K), dtype=np.float64)
    cdef double iw, ih, box_area
    cdef double ua
    cdef unsigned int k, n
    for k in range(K):
        box_area = (
            (query_boxes[k, 2] - query_boxes[k, 0] + 1) *
            (query_boxes[k, 3] - query_boxes[k, 1] + 1)
        )
        for n in range(N):
            iw = (
                min(boxes[n, 2], query_boxes[k, 2]) -
                max(boxes[n, 0], query_boxes[k, 0]) + 1
            )
            if iw > 0:
                ih = (
                    min(boxes[n, 3], query_boxes[k, 3]) -
                    max(boxes[n, 1], query_boxes[k, 1]) + 1
                )
                if ih > 0:
                    ua = np.float64(
                        (boxes[n, 2] - boxes[n, 0] + 1) *
                        (boxes[n, 3] - boxes[n, 1] + 1) +
                        box_area - iw * ih
                    )
                    overlaps[n, k] = iw * ih / ua
    return overlaps

cython的使用也比较简单,我们需要写一个setup.py文件将.pyx文件转化为.c文件,同时还会生成一个.so文件,这个so文件使我们import时候用的,它是c文件和我们python代码之间的桥梁,实现直接调用c代码。
setup文件有两种写法,我只写其中一种

from distutils.core import setup, Extension
from Cython.Build import cythonize
import numpy as np

setup(ext_modules=cythonize("compute_overlap.pyx"),include_dirs=[np.get_include()])
#setup(ext_modules = cythonize("bbox_transform.pyx"),include_dirs=[np.get_include()])

然后在命令行输入

python setup.py build

然后我们就会发现pyx的文件夹下面多了一个compute_overlap.c文件,还有一个build文件夹,进入build文件夹之后,有个lib文件,便是我们生成的python库。

drwxrwxr-x 3 zhaoyang zhaoyang 4096 Dec 25 12:09 lib.linux-x86_64-3.6

进入到这个lib文件的最底部会发现有我们需要的.so文件

-rwxrwxr-x 1 zhaoyang zhaoyang 180328 Dec 25 12:09 compute_overlap.cpython-36m-x86_64-linux-gnu.so

将这个文件cp到跟.c文件一个文件夹下。
然后我们就可以通过import compute_overlap来调用pyx文件对应的c文件了。
注意这个import是import的pyx文件,比如我们pyx文件名是compute_overlap.pyx,里面有很多函数,其中之一叫compute_overlap(a,b).
所以在调用compute_overlap函数的时候,是compute_overlap.compute_overlap(a,b),前者表示文件,后者代表函数。

cython版本的compute_overlap函数比python版本的快了五倍,单张图片overlap计算从1.5s降到0.3s左右。

另外bbox_transform.pyx和bbox_transform_inv.pyx如下

这两个函数对性能提升不如compute_overlap显著,不过也缩短了一半,bbox_transform从0.7s变成了0.3s。

所以之前单张图片训练耗时2.4s,现在是2.4-1.2-0.4~~0.8s

认识cython之前感觉人生被浪费了

import numpy as np

def bbox_transform(ex_rois, gt_rois):
    '''
    Receives two sets of bounding boxes, denoted by two opposite corners
    (x1,y1,x2,y2), and returns the target deltas that Faster R-CNN should aim
    for.
    '''
    ex_widths = ex_rois[:, 2] - ex_rois[:, 0] + 1.0
    ex_heights = ex_rois[:, 3] - ex_rois[:, 1] + 1.0
    ex_ctr_x = ex_rois[:, 0] + 0.5 * ex_widths
    ex_ctr_y = ex_rois[:, 1] + 0.5 * ex_heights

    gt_widths = gt_rois[:, 2] - gt_rois[:, 0] + 1.0
    gt_heights = gt_rois[:, 3] - gt_rois[:, 1] + 1.0
    gt_ctr_x = gt_rois[:, 0] + 0.5 * gt_widths
    gt_ctr_y = gt_rois[:, 1] + 0.5 * gt_heights

    targets_dx = (gt_ctr_x - ex_ctr_x) / ex_widths
    targets_dy = (gt_ctr_y - ex_ctr_y) / ex_heights
    targets_dw = np.log(gt_widths / ex_widths)
    targets_dh = np.log(gt_heights / ex_heights)

    targets = np.vstack(
        (targets_dx, targets_dy, targets_dw, targets_dh)).transpose()
    mean = np.array([0, 0, 0, 0])
    std = np.array([0.1, 0.1, 0.2, 0.2])

    return (targets-mean)/std

import numpy as np


def bbox_transform_inv(boxes, deltas, mean=None, std=None):
    if mean is None:
        mean = np.array([0, 0, 0, 0], dtype=np.float32)
    if std is None:
        std = np.array([0.1, 0.1, 0.2, 0.2], dtype=np.float32)

    widths = boxes[:, 2] - boxes[:, 0] + 1.0
    heights = boxes[:, 3] - boxes[:, 1] + 1.0
    ctr_x = boxes[:, 0] + 0.5 * widths
    ctr_y = boxes[:, 1] + 0.5 * heights

    dx = deltas[:, :, 0] * std[0] + mean[0]
    dy = deltas[:, :, 1] * std[1] + mean[1]
    dw = deltas[:, :, 2] * std[2] + mean[2]
    dh = deltas[:, :, 3] * std[3] + mean[3]

    pred_ctr_x = ctr_x + dx * widths
    pred_ctr_y = ctr_y + dy * heights
    pred_w = np.exp(dw) * widths
    pred_h = np.exp(dh) * heights

    pred_boxes = np.zeros(deltas.shape, dtype=deltas.dtype)

    pred_boxes_x1 = pred_ctr_x - 0.5 * pred_w
    pred_boxes_y1 = pred_ctr_y - 0.5 * pred_h
    pred_boxes_x2 = pred_ctr_x + 0.5 * pred_w
    pred_boxes_y2 = pred_ctr_y + 0.5 * pred_h

    pred_boxes = np.stack([pred_boxes_x1, pred_boxes_y1,
                           pred_boxes_x2, pred_boxes_y2], axis=2)

    return pred_boxes

你可能感兴趣的:(算法,Deep,Learning)