Faster RCNN 目标检测之Loss

这一节主要介绍faster rcnn中的误差，faster rcnn的误差可以分成两个部分，rpn Loss和fast rcnn Loss。faster rcnn最后的误差是rpn Loss和fast rcnn Loss之和。

rpn Loss和fast rcnn Loss的前向传播过程基本一致，它们都包含分类误差和定位误差，分类误差使用交叉熵分类误差（CrossEntropy），定位误差使用Smooth L1误差。

在Loss传播过程中，有一个比较关键的地方是，如何将网络的预测与真实地面框相联系起来，这也是误差计算过程中难以理解的部分，下面我会详细的介绍这一部分。

rpn Loss

rpn Loss前向传播过程

rpn Loss的前向传播过程如下:

首先计算anchor与bbox（真实框）的iou（shape -> len(anchor) * len(bbox)）, 取出与每一个anchor具有最高iou及其索引(行索引)，并且取出与每一个bbox具有最高iou的行索引。
根据上一步得到的iou以及位置索引，结合设定的阈值（前景iou阈值以及背景iou阈值），为每一个anchor赋予不同的标签值（-1 -> 不关心，0 -> 背景， 1 -> 前景）。
根据设定的anchor保留数目和前景anchor比例分别对前后景anchor进行随机抽取采样。
对保留的所有anchor和每一个anchor对应具有最高iou的bbox计算坐标偏移。
最后结合rpn网络的输出，分别计算分类误差和定位误差。

代码实现

这里展示了AnchorTargetLayer类的代码，这个类联系了anchor和地面真实框（bbox），它的主要作用是进行匹配，即为每一个anchor匹配一个bbox，并返回匹配的坐标偏移和前后景标签。

class AnchorTargetCreator(object):
    """
    匹配每一个bbox到anchor
    
    Args:
        n_sample (int): 指定生成的区域候选数目.
        pos_iou_thresh (float): iou超过该阈值即被认为是前景
        neg_iou_thresh (float): iou低于该阈值即被认为是背景
        pos_ratio (float): 指定前景候选在最终输出候选中所占比例

    """

    def __init__(self,
                 n_sample=256,
                 pos_iou_thresh=0.7, neg_iou_thresh=0.3,
                 pos_ratio=0.5):
        self.n_sample = n_sample
        self.pos_iou_thresh = pos_iou_thresh
        self.neg_iou_thresh = neg_iou_thresh
        self.pos_ratio = pos_ratio

    def __call__(self, bbox, anchor, img_size):
        """
        * :math:`S` anchor数目
        * :math:`R` bbox数目
        Args:
            bbox (array): bbox的坐标. shape -> :math:`(R, 4)`.
            anchor (array): anchor的坐标. shape -> :math:`(S, 4)`.
            img_size (tuple of ints): 一个元组 :obj:`H, W`, 图片的宽高.
        Returns:
            #NOTE: it's scale not only  offset
            * **loc**: 匹配anchor到bbox的坐标偏移.shape -> :math:`(S, 4)`.
            * **label**: anchor的标签            :obj:`(1=positive, 0=negative, -1=ignore)`. shape -> :math:`(S,)`.
        """

        img_H, img_W = img_size

        n_anchor = len(anchor)

        # 丢弃超出边界的anchor
        inside_index = _get_inside_index(anchor, img_H, img_W)
        anchor = anchor[inside_index]

        # 为每一个anchor指定一个bbox(目标), 以便于计算坐标偏移
        argmax_ious, label = self._create_label(
            inside_index, anchor, bbox)

        # anchor -> bbox, 计算对应的坐标偏移
        loc = bbox2loc(anchor, bbox[argmax_ious])

        # map up to original set of anchors
        label = _unmap(label, n_anchor, inside_index, fill=-1)
        loc = _unmap(loc, n_anchor, inside_index, fill=0)

        return loc, label

    def _create_label(self, inside_index, anchor, bbox):
        # label: 1 is positive, 0 is negative, -1 is dont care
        # 初始化anchor标签, 填充-1.
        # -1 表示不关心, 0表示背景, 1表示前景/物体
        label = np.empty((len(inside_index),), dtype=np.int32)
        label.fill(-1)

        argmax_ious, max_ious, gt_argmax_ious = \
            self._calc_ious(anchor, bbox, inside_index)

        # assign negative labels first so that positive labels can clobber them
        # 对于每个anchor, 与其具有最高的iou小于背景iou阈值, 设置标签为1
        label[max_ious < self.neg_iou_thresh] = 0

        # 对于每个bbox, 与其具有最高的iou的anchor, 设置标签为1
        label[gt_argmax_ious] = 1

        # 对于每个anchor, 与其具有最高的iou大于前景iou阈值, 设置标签为1
        label[max_ious >= self.pos_iou_thresh] = 1

        # 前景采样, 如果剩余的anchor数目大于需要的anchor,则进行随机采样
        n_pos = int(self.pos_ratio * self.n_sample)
        pos_index = np.where(label == 1)[0]
        if len(pos_index) > n_pos:
            disable_index = np.random.choice(
                pos_index, size=(len(pos_index) - n_pos), replace=False)
            label[disable_index] = -1

        # 背景采样, 如果剩余的anchor数目大于需要的anchor,则进行随机采样
        n_neg = self.n_sample - np.sum(label == 1)
        neg_index = np.where(label == 0)[0]
        if len(neg_index) > n_neg:
            disable_index = np.random.choice(
                neg_index, size=(len(neg_index) - n_neg), replace=False)
            label[disable_index] = -1

        return argmax_ious, label

    def _calc_ious(self, anchor, bbox, inside_index):
        # 计算anchor与bbox的iou，ious ->(len(inside_index), len(bbox))
        ious = bbox_iou(anchor, bbox)
        # 获取与每个anchor iou最大的bbox(真实框)的位置
        argmax_ious = ious.argmax(axis=1)

        # 根据位置取出最大的iou, shape -> (len(inside_index),)
        max_ious = ious[np.arange(len(inside_index)), argmax_ious]

        # 获取与每个bbox iou最大的anchor的位置
        gt_argmax_ious = ious.argmax(axis=0)

        # 根据位置取出最大的iou, shape -> (len(bbox), )
        gt_max_ious = ious[gt_argmax_ious, np.arange(ious.shape[1])]

        # 获取最大的iou在对应的行索引
        gt_argmax_ious = np.where(ious == gt_max_ious)[0]

        return argmax_ious, max_ious, gt_argmax_ious


def _unmap(data, count, index, fill=0):
    if len(data.shape) == 1:
        ret = np.empty((count,), dtype=data.dtype)
        ret.fill(fill)
        ret[index] = data
    else:
        ret = np.empty((count,) + data.shape[1:], dtype=data.dtype)
        ret.fill(fill)
        ret[index, :] = data
    return ret


def _get_inside_index(anchor, H, W):
    # 得到所有位于图片内的anchor.
    index_inside = np.where(
        (anchor[:, 0] >= 0) &
        (anchor[:, 1] >= 0) &
        (anchor[:, 2] <= H) &
        (anchor[:, 3] <= W)
    )[0]
    return index_inside

faster rcnn Loss

这里直接展示ProposalTargetCreator类的详细代码，它联系了区域候选与真实框，主要功能是将匹配每一个bbox到roi，并返回采样后的roi、采样后roi的坐标偏移和采样后roi的类别标签。

代码实现

class ProposalTargetCreator(object):
    """
    匹配每一个bbox到roi

    Args:
        n_sample (int): 设定的保留的roi的个数
        pos_ratio (float): 设定前景roi在最终保留的roi中的比例         pos_iou_thresh (float): iou大于该阈值被认为是前景
        neg_iou_thresh_hi (float): iou处于[neg_iou_thresh_lo, neg_iou_thresh_hi]被认为是背景  
        neg_iou_thresh_lo (float): 

    """

    def __init__(self,
                 n_sample=128,
                 pos_ratio=0.25, pos_iou_thresh=0.5,
                 neg_iou_thresh_hi=0.5, neg_iou_thresh_lo=0.0
                 ):
        self.n_sample = n_sample
        self.pos_ratio = pos_ratio
        self.pos_iou_thresh = pos_iou_thresh
        self.neg_iou_thresh_hi = neg_iou_thresh_hi
        self.neg_iou_thresh_lo = neg_iou_thresh_lo 

    def __call__(self, roi, bbox, label,
                 loc_normalize_mean=(0., 0., 0., 0.),
                 loc_normalize_std=(0.1, 0.1, 0.2, 0.2)):
                 
        """

        * :math:`S` roi采样的数目 
        * :math:`L` 包含背景类的类别概率数.

        Args:
            roi (array):rpn 输出的roi. shape -> :math:`(R, 4)`
            bbox (array): 地面真实框. shape -> :math:`(R', 4)`.
            label (array):地面真实框的类别标签. shape -> :math:`(R',)`.    
            loc_normalize_mean (tuple of four floats):指定的坐标偏移归一化的均值.
            loc_normalize_std (tupler of four floats): 指定的坐标偏移归一化的标准差.

        Returns:
            * **sample_roi**: 采样的roi. shape -> :math:`(S, 4)`.
            * **gt_roi_loc**: 从采样的roi到bbox需要应用的坐标偏移. shape ->:math:`(S, 4)`.
            
            * **gt_roi_label**: 赋给采样roi的类别标签. shape ->:math:`(S,)`. shape -> :math:`[0, L]`.
        """
        n_bbox, _ = bbox.shape
        
        # 计算最终输出前景roi的数目
        pos_roi_per_image = np.round(self.n_sample * self.pos_ratio)
        
        # 计算roi和bbox的iou，iou -> (len(roi), len(bbox))
        iou = bbox_iou(roi, bbox)
        
        # 每一个anchor，找到与之iou最大的bbox的位置
        gt_assignment = iou.argmax(axis=1)
        
        # 每一个anchor，找到与之具有最大的iou
        max_iou = iou.max(axis=1)
        
        # Offset range of classes from [0, n_fg_class - 1] to [1, n_fg_class].
        
        # 标签为0表示背景
        gt_roi_label = label[gt_assignment] + 1

        #  前景采样, 如果剩余的roi数目大于需要的roi,则进行随机采样
        # 选择roi 大于pos_iou_thresh的roi.
        pos_index = np.where(max_iou >= self.pos_iou_thresh)[0]
        
        pos_roi_per_this_image = int(min(pos_roi_per_image, pos_index.size))
        if pos_index.size > 0:
            pos_index = np.random.choice(
                pos_index, size=pos_roi_per_this_image, replace=False)
        
        
         #  背景采样, 如果剩余的roi数目大于需要的roi,则进行随机采样
        # 选择处于[neg_iou_thresh_lo, neg_iou_thresh_hi]区间的roi.
        neg_index = np.where((max_iou < self.neg_iou_thresh_hi) &                             (max_iou >= self.neg_iou_thresh_lo))[0]
        neg_roi_per_this_image = self.n_sample - pos_roi_per_this_image
        neg_roi_per_this_image = int(min(neg_roi_per_this_image,
                                         neg_index.size))
        if neg_index.size > 0:
            neg_index = np.random.choice(
                neg_index, size=neg_roi_per_this_image, replace=False)

        # 合并前景采样和背景采样.
        keep_index = np.append(pos_index, neg_index)
        gt_roi_label = gt_roi_label[keep_index]
        gt_roi_label[pos_roi_per_this_image:] = 0  
        sample_roi = roi[keep_index]

        # roi -> bbox, 计算对应的坐标偏移
        gt_roi_loc = bbox2loc(sample_roi, bbox[gt_assignment[keep_index]])
        
        #使用指定的均值和方差对坐标偏移进行归一化
        gt_roi_loc = ((gt_roi_loc - np.array(loc_normalize_mean, np.float32)
                       ) / np.array(loc_normalize_std, np.float32))

        return sample_roi, gt_roi_loc, gt_roi_label

Smooth L1误差函数在网络中用来计算坐标偏移误差，但在计算坐标偏移误差过程中有一点需要注意，标签为背景的边界框是不会计算定位误差的。
Smooth L1误差的数学形式如下：

公式中，α为超参数，通常取1使得误差函数连续。Smooth L1误差结合了L1误差（具有稳定的梯度）和L2误差（当输入较小是梯度的震荡较小）。当输入较小是，表现为L2误差，否则，表现为L1误差。在rpn Loss计算过程中，α取3；而fast rcnn Loss计算过程中，α取1。

总结

本节详细介绍了faster rcnn误差计算过程中预测和真实的匹配过程，即如何将anchor与bbox对应（AnchorTargetLayer）以及如何将roi与bbox对应起来（ProposalTargetLayer），并且展示了它们的关键代码。理解了这个匹配过程，基本就理解了整个网络的误差传播过程。误差的具体计算比较容易理解，就不做具体介绍了。

下一节，我会介绍如何训练faster rcnn。

Reference

https://towardsdatascience.co...

https://stats.stackexchange.c...

Girshick R. Fast r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1440-1448.

Faster RCNN 目标检测之Loss

rpn Loss

rpn Loss前向传播过程

代码实现

faster rcnn Loss

代码实现

总结

Reference

你可能感兴趣的:(python,pytorch,cnn)