detectron2(目标检测框架)无死角玩转-09:源码详解(5)-anchor生成

以下链接是个人关于detectron2(目标检测框架),所有见解,如有错误欢迎大家指出,我会第一时间纠正。有兴趣的朋友可以加微信:17575010159 相互讨论技术。若是帮助到了你什么,一定要记得点赞!因为这是对我最大的鼓励。
文末附带 \color{blue}{文末附带} 文末附带 公众号 − \color{blue}{公众号 -} 公众号 海量资源。 \color{blue}{ 海量资源}。 海量资源

detectron2(目标检测框架)无死角玩转-00:目录

前言

通过前面的一系列博客,已经把把大部分都讲解完成了,总的来说,还有两个点需要详细的讲解,一个为anchor,另外一个为 loss,该小结我们来讲解anchor,在讲解之前,推荐我之前的一篇博客,方便大家更加深入的了解anchor:
深度解剖(5):白话谈anchor(锚点),不懂来找我!
看过之后,我相信大家对于anchor算是有一定了解了,那么我们就开始深入的了解anchor,我们以 detectron2中 的RetinaNet为例子来讲解。

anchor生成

我们查看 configs\Base-RetinaNet.yaml 文件,可以看到如下:

  ANCHOR_GENERATOR: # anchor的生成
    SIZES: !!python/object/apply:eval ["[[x, x * 2**(1.0/3), x * 2**(2.0/3) ] for x in [32, 64, 128, 256, 512 ]]"]

在代码的运行中,其会变成一个列表,列表内容如下:

cfg.MODEL.ANCHOR_GENERATOR.SIZES = 
[[32, 40.3, 50.7], 
[64, 80.6, 101.5], 
[128, 161.2, 203.12], 
[256, 322.5, 406.3], 
[512, 645.0, 812.7]]

表示的是anchor初始的大小,后续还需要做一些列的变化,从这里可以知道,anchor的大小主要由 32,64,128,256,512 五个尺寸缩放而来,每个尺寸扩大了两次,分别为 X ⋅ 2 ( 1.0 / 3 ) X·2^{(1.0/3)} X2(1.0/3) X ⋅ 2 ( 2.0 / 3 ) X·2^{(2.0/3)} X2(2.0/3)。从上面可以明显的看到,共生成 3x5 个 anchor 相关的数据,但是他有什么用呢?不急,我们先继续往下看。

打开源码 detectron2\modeling\anchor_generator.py,先找到如下部分代码:

class DefaultAnchorGenerator(nn.Module):
    """
    For a set of image sizes and feature maps, computes a set of anchors.
    """

    def __init__(self, cfg, input_shape: List[ShapeSpec]):
        super().__init__()
        # fmt: off
        sizes         = cfg.MODEL.ANCHOR_GENERATOR.SIZES

        # 默认[[0.5, 1.0, 2.0], [0.5, 1.0, 2.0], [0.5, 1.0, 2.0], [0.5, 1.0, 2.0], [0.5, 1.0, 2.0]]
        aspect_ratios = cfg.MODEL.ANCHOR_GENERATOR.ASPECT_RATIOS

        # 默认[8, 16, 32, 64, 128]
        self.strides  = [x.stride for x in input_shape]

        # 默认0.0
        self.offset   = cfg.MODEL.ANCHOR_GENERATOR.OFFSET

        # 检测self.offset是否符合标准
        assert 0.0 <= self.offset < 1.0, self.offset

        # 特征图的数目
        self.num_features = len(self.strides)

        # anchor的基本细胞,通过aspect_ratios,结合 sizes 生成最基本的 anchor
        self.cell_anchors = self._calculate_anchors(sizes, aspect_ratios)

其上的 sizes = cfg.MODEL.ANCHOR_GENERATOR.SIZES 即使前面介绍的十五个默认配置参数。aspect_ratios 表示的是高和框的比例,这个有什么作用呢?进入 self._calculate_anchors 函数就知道了:

    def _calculate_anchors(self, sizes, aspect_ratios):
        # If one size (or aspect ratio) is specified and there are multiple feature
        # maps, then we "broadcast" anchors of that single size (or aspect ratio)
        # over all feature maps.
        # 默认len(sizes) = 5
        if len(sizes) == 1:
            sizes *= self.num_features
        # 默认len(sizes) = 5    
        if len(aspect_ratios) == 1:
            aspect_ratios *= self.num_features
        assert self.num_features == len(sizes)
        assert self.num_features == len(aspect_ratios)

        # anchor的基本细胞,通过aspect_ratios,结合 sizes 生成最基本的 anchor
        cell_anchors = [
            self.generate_cell_anchors(s, a).float() for s, a in zip(sizes, aspect_ratios)
        ]
        return BufferList(cell_anchors)

可以看到,最终的核心函数还是self.generate_cell_anchors,该函数接收的参数s=size, a=aspect_ratio 为,那么我们看看该函数的内部实现:

    def generate_cell_anchors(self, sizes=(32, 64, 128, 256, 512), aspect_ratios=(0.5, 1, 2)):
        """
        Generate a tensor storing anchor boxes, which are continuous geometric rectangles
        centered on one feature map point sample. We can later build the set of anchors
        for the entire feature map by tiling these tensors; see `meth:grid_anchors`.

        Args:
            sizes (tuple[float]): Absolute size of the anchors in the units of the input
                image (the input received by the network, after undergoing necessary scaling).
                The absolute size is given as the side length of a box.
            aspect_ratios (tuple[float]]): Aspect ratios of the boxes computed as box
                height / width.

        Returns:
            Tensor of shape (len(sizes) * len(aspect_ratios), 4) storing anchor boxes
                in XYXY format.
        """

        # This is different from the anchor generator defined in the original Faster R-CNN
        # code or Detectron. They yield the same AP, however the old version defines cell
        # anchors in a less natural way with a shift relative to the feature grid and
        # quantization that results in slightly different sizes for different aspect ratios.
        # See also https://github.com/facebookresearch/Detectron/issues/227

        anchors = []
        for size in sizes:
            area = size ** 2.0
            for aspect_ratio in aspect_ratios:
                # s * s = w * h
                # a = h / w
                # ... some algebra ...
                # w = sqrt(s * s / a)
                # h = a * w
                w = math.sqrt(area / aspect_ratio)
                h = aspect_ratio * w
                x0, y0, x1, y1 = -w / 2.0, -h / 2.0, w / 2.0, h / 2.0
                anchors.append([x0, y0, x1, y1])
        return torch.tensor(anchors)

这里的英文注释比较详细,我也给大家简单的讲解一下。假设我们传输如的参数 s=[32, 40.3, 50.7], a = [0.5, 1.0, 2.0]。

通过前面推荐的博客,应该可以了解到,其实每个anchor都是一个box,这些anchor的形状都应该是不一样,不应该都是正方向,其中长方形应该也存在。根据上面的公式 s ∗ s = w ∗ h s * s = w * h ss=wh, a = h / w a = h / w a=h/w,其中 s = [ 32 , 40.3 , 50.7 ] , a = [ 0.5 , 1.0 , 2.0 ] s=[32, 40.3, 50.7], a = [0.5, 1.0, 2.0] s=[32,40.3,50.7],a=[0.5,1.0,2.0], 我们可以求得h以及w,但是大家要注意一点,最后再获得box的时候,即 anchors.append([x0, y0, x1, y1]) 使用关于中心对称的方法进行表示,如原本为[x0, y0, x1, y1] = [0,0,32,32] 的坐标,使用[-16, -16, 16, 16] 进行表示。根据如下源代码:

     # anchor的基本细胞,通过aspect_ratios,结合 sizes 生成最基本的 anchor
        cell_anchors = [
            self.generate_cell_anchors(s, a).float() for s, a in zip(sizes, aspect_ratios)
        ]

可以知道anchor box共 5 x 9 = 45 个,如本人的45个anchor cell box如下:

0 = {Tensor: 9} tensor([[-22.6274, -11.3137,  22.6274,  11.3137],\n        [-16.0000, -16.0000,  16.0000,  16.0000],\n        [-11.3137, -22.6274,  11.3137,  22.6274],\n        [-28.5088, -14.2544,  28.5088,  14.2544],\n        [-20.1587, -20.1587,  20.1587,  20.1587],\n        [-14.2544, -28.5088,  14.2544,  28.5088],\n        [-35.9188, -17.9594,  35.9188,  17.9594],\n        [-25.3984, -25.3984,  25.3984,  25.3984],\n        [-17.9594, -35.9188,  17.9594,  35.9188]])
1 = {Tensor: 9} tensor([[-45.2548, -22.6274,  45.2548,  22.6274],\n        [-32.0000, -32.0000,  32.0000,  32.0000],\n        [-22.6274, -45.2548,  22.6274,  45.2548],\n        [-57.0175, -28.5088,  57.0175,  28.5088],\n        [-40.3175, -40.3175,  40.3175,  40.3175],\n        [-28.5088, -57.0175,  28.5088,  57.0175],\n        [-71.8376, -35.9188,  71.8376,  35.9188],\n        [-50.7968, -50.7968,  50.7968,  50.7968],\n        [-35.9188, -71.8376,  35.9188,  71.8376]])
2 = {Tensor: 9} tensor([[ -90.5097,  -45.2548,   90.5097,   45.2548],\n        [ -64.0000,  -64.0000,   64.0000,   64.0000],\n        [ -45.2548,  -90.5097,   45.2548,   90.5097],\n        [-114.0350,  -57.0175,  114.0350,   57.0175],\n        [ -80.6349,  -80.6349,   80.6349,   80.6349],\n        [ -57.0175, -114.0350,   57.0175,  114.0350],\n        [-143.6751,  -71.8376,  143.6751,   71.8376],\n        [-101.5937, -101.5937,  101.5937,  101.5937],\n        [ -71.8376, -143.6751,   71.8376,  143.6751]])
3 = {Tensor: 9} tensor([[-181.0193,  -90.5097,  181.0193,   90.5097],\n        [-128.0000, -128.0000,  128.0000,  128.0000],\n        [ -90.5097, -181.0193,   90.5097,  181.0193],\n        [-228.0701, -114.0350,  228.0701,  114.0350],\n        [-161.2699, -161.2699,  161.2699,  161.2699],\n        [-114.0350, -228.0701,  114.0350,  228.0701],\n        [-287.3503, -143.6751,  287.3503,  143.6751],\n        [-203.1873, -203.1873,  203.1873,  203.1873],\n        [-143.6751, -287.3503,  143.6751,  287.3503]])
4 = {Tensor: 9} tensor([[-362.0387, -181.0193,  362.0387,  181.0193],\n        [-256.0000, -256.0000,  256.0000,  256.0000],\n        [-181.0193, -362.0387,  181.0193,  362.0387],\n        [-456.1401, -228.0701,  456.1401,  228.0701],\n        [-322.5398, -322.5398,  322.5398,  322.5398],\n        [-228.0701, -456.1401,  228.0701,  456.1401],\n        [-574.7006, -287.3503,  574.7006,  287.3503],\n        [-406.3747, -406.3747,  406.3747,  406.3747],\n        [-287.3503, -574.7006,  287.3503,  574.7006]])

这样,我就把class DefaultAnchorGenerator(nn.Module):的初始化函数分析完成了,接下来我们看看其正向传播的过程,代码如下:

    def forward(self, features):
        """
        Args:
            features (list[Tensor]): list of backbone feature maps on which to generate anchors.

        Returns:
            list[list[Boxes]]: a list of #image elements. Each is a list of #feature level Boxes.
                The Boxes contains anchors of this image on the specific feature level.
        """
        # 获得特征图的层数
        num_images = len(features[0])
        # 获得每个特征图网格大小,如本人运行为:
        # torch.Size([72, 92]), torch.Size([36, 46]), torch.Size([18, 23]), torch.Size([9, 12]), torch.Size([5, 6])
        grid_sizes = [feature_map.shape[-2:] for feature_map in features]
        # 为每个网格产生对应的anchor
        anchors_over_all_feature_maps = self.grid_anchors(grid_sizes)

        anchors_in_image = []
        # 对所有anchor进行排查,剔除掉不在图像内的anchor
        for anchors_per_feature_map in anchors_over_all_feature_maps:
            boxes = Boxes(anchors_per_feature_map)
            anchors_in_image.append(boxes)

        # 对anchors进行一次深度复制
        anchors = [copy.deepcopy(anchors_in_image) for _ in range(num_images)]
        return anchors

其实流程还是十分简单的,前面通过self.generate_cell_anchors获得了最基本的anchor cell,在这个基础上,我们在去生成每个特征图网格对应的anchor,通过前面推荐的博客,可以知道特征图可以看作带着anchor滑动的过程,有了anchor cell之后,计算每个网格的anchor,加上对应网格的偏移值就可以了。实现的过程,为上面的self.grid_anchors(grid_sizes)函数,如下:

    def forward(self, features):
        # 获得特征图的层数
        num_images = len(features[0])
        # 获得每个特征图网格大小,如本人运行为:
        # torch.Size([72, 92]), torch.Size([36, 46]), torch.Size([18, 23]), torch.Size([9, 12]), torch.Size([5, 6])
        grid_sizes = [feature_map.shape[-2:] for feature_map in features]
        # 为每个网格产生对应的anchor
        anchors_over_all_feature_maps = self.grid_anchors(grid_sizes)

        anchors_in_image = []
        # 对所有anchor进行排查,剔除掉不在图像内的anchor
        for anchors_per_feature_map in anchors_over_all_feature_maps:
            boxes = Boxes(anchors_per_feature_map)
            anchors_in_image.append(boxes)

        # 对anchors进行一次深度复制
        anchors = [copy.deepcopy(anchors_in_image) for _ in range(num_images)]
        return anchors

其过程还是很简单的,很明显的可以知道,其核心函数为self.grid_anchors(grid_sizes),实现过程如下:

    def grid_anchors(self, grid_sizes):
        anchors = []
        # size=网格大小, stride=特征图相对于原图缩放的大小, cell_anchors=每个网格基本的anchors
        for size, stride, base_anchors in zip(grid_sizes, self.strides, self.cell_anchors):
            # 根据网格的偏移生成对应的anchor
            shift_x, shift_y = _create_grid_offsets(size, stride, self.offset, base_anchors.device)
            shifts = torch.stack((shift_x, shift_y, shift_x, shift_y), dim=1)

            anchors.append((shifts.view(-1, 1, 4) + base_anchors.view(1, -1, 4)).reshape(-1, 4))

        return anchors

这里会为每个特征图生成对应的anchor。其 anchors 为一个列表,包含了生成的所有anchor。

在这里插入图片描述

你可能感兴趣的:(detectron2,目标检测,人工智能,深度学习)