【目标检测】mosaic与PAN

文章目录

  • mosaic
  • PAN
  • 训练结果

所有代码已上传到本人github repository:https://github.com/zgcr/pytorch-ImageNet-CIFAR-COCO-VOC-training
如果觉得有用,请点个star哟!
下列代码均在pytorch1.4版本中测试过,确认正确无误。

mosaic与PAN都是yolov4(https://arxiv.org/pdf/2004.10934.pdf)中集成的trick。mosaic是一种数据增强方式,PAN则是一种即插即用模块,使用后只会增加少量flops和params,代价很小。

mosaic

mosaic用一句话概括就是把四张训练图片缩放拼成一张图。在COCO数据集中,小目标的数量占到了41.4%,但是小目标在图片上的分布不太均匀,只有52.3%的图片中有小目标,这导致在常规的训练中小目标的学习总是不太充分。使用mosaic数据增强后,在遍历每个epoch中的每张图时,额外再从训练集中随机抽取三张图,先将第一张图随机缩小,然后保持最后输出图片尺寸还是第一张图的原始尺寸,这样根据缩小后的第一张图右下角,输出图片可以划分成四个部分。可把左上部分编号1,然后顺时针编号剩下三个部分为2、3、4。缩小后的第一张图放到编号1部分,剩下三张图分别resize到部分2、3、4的尺寸。这样,同时抽取四张图的情况下,都没有小目标的可能性本身就很小了,同时,每张图都有不同程度的缩小,即使没有小目标,通过缩小,原来的目标尺寸也更接近小目标的大小,这对模型学习小目标很有利。
现在,我要在RetinaNet中应用mosaic。由于yolov4中resize并不保持图片原有的长宽比,而RetinaNet需要在resize四张图片时都保证长宽比,因此要对上面的mosaic方法做一点修改。对第一张图片,在0.2到0.8之间取一个随机数作为缩放比例(注意不能太小也不能太大,否则resize的四张图片中会有图片变得很小,模型学习不了),然后缩小第一张图片。对第2、3、4张图片,取其长、宽中的长边,除以对应部分2、3、4中的对应边(长边是长,则对应边是部分x的长,长边是宽,也对应边是部分x的宽),得到缩放比例,然后缩放第2、3、4张图片,这样可以保证缩放到的第2、3、4张图片长、宽均小于等于对应部分的长、宽。然后,对于图片2,其左上角与图片1的右上角对齐;对于图片3,其左上角与图片1的右下角对齐;对于图片4,其左上角与图片1的左下角对齐。这样我们就得到了一张由四张图拼好的训练图片了。然后,我们只要将对应图片的bbox作相应的修改,再合并起来就是新训练图片的bbox了。
需要注意的是,由于每张图都有不同程度的缩小,大目标的监督信息变少了,模型对大目标的学习可能会变得不充分。对于这个问题,有两种解决方案:第一,遍历每个epoch中的每张图时,设定一个概率0.5来觉得这张图是否要使用mosaic数据增强;第二,仍然每张图都做mosaic数据增强,多训几个epoch,直到模型对大目标的mAP和mAR达到没用mosaic数据增强前的点数为止。
对RetinaNet的mosaic代码实现如下(注意其他数据增强类不需要修改):

import os
import cv2
import torch
import numpy as np
import random
import math
from torch.utils.data import Dataset
from pycocotools.coco import COCO
import torch.nn.functional as F

class CocoDetection(Dataset):
    def __init__(self,
                 image_root_dir,
                 annotation_root_dir,
                 set='train2017',
                 use_mosaic=False,
                 transform=None):
        self.image_root_dir = image_root_dir
        self.annotation_root_dir = annotation_root_dir
        self.set_name = set
        self.use_mosaic = use_mosaic
        self.transform = transform

        self.coco = COCO(
            os.path.join(self.annotation_root_dir,
                         'instances_' + self.set_name + '.json'))

        self.load_classes()

    def load_classes(self):
        self.image_ids = self.coco.getImgIds()
        self.cat_ids = self.coco.getCatIds()
        self.categories = self.coco.loadCats(self.cat_ids)
        self.categories.sort(key=lambda x: x['id'])

        # category_id is an original id,coco_id is set from 0 to 79
        self.category_id_to_coco_label = {
            category['id']: i
            for i, category in enumerate(self.categories)
        }
        self.coco_label_to_category_id = {
            v: k
            for k, v in self.category_id_to_coco_label.items()
        }

    def __len__(self):
        return len(self.image_ids)

    def __getitem__(self, idx):
        if self.use_mosaic:
            if np.random.uniform(0, 1.)<0.5:
                imgs, annots = [], []
                img = self.load_image(idx)
                imgs.append(img)
                annot = self.load_annotations(idx)
                annots.append(annot)

                index_list, index = [idx], idx
                for _ in range(3):
                    while index in index_list:
                        index = np.random.randint(0, len(self.image_ids))
                    index_list.append(index)
                    img = self.load_image(index)
                    imgs.append(img)
                    annot = self.load_annotations(index)
                    annots.append(annot)

                # 第1,2,3,4张图片按顺时针方向排列,1为左上角图片,先计算出第2张图片的scale,然后推算出其他图片的最大resize尺寸,为了不让四张图片中某几张图片太小造成模型学习困难,scale限制为在0.25到0.75之间生成的随机浮点数。
                scale1 = np.random.uniform(0.2, 0.8)
                height1, width1, _ = imgs[0].shape

                imgs[0] = cv2.resize(imgs[0],
                                    (int(width1 * scale1), int(height1 * scale1)))

                max_height2, max_width2 = int(
                    height1 * scale1), width1 - int(width1 * scale1)
                height2, width2, _ = imgs[1].shape
                scale2 = max_height2 / height2
                if int(scale2 * width2) > max_width2:
                    scale2 = max_width2 / width2
                imgs[1] = cv2.resize(imgs[1],
                                    (int(width2 * scale2), int(height2 * scale2)))

                max_height3, max_width3 = height1 - int(
                    height1 * scale1), width1 - int(width1 * scale1)
                height3, width3, _ = imgs[2].shape
                scale3 = max_height3 / height3
                if int(scale3 * width3) > max_width3:
                    scale3 = max_width3 / width3
                imgs[2] = cv2.resize(imgs[2],
                                    (int(width3 * scale3), int(height3 * scale3)))

                max_height4, max_width4 = height1 - int(height1 * scale1), int(
                    width1 * scale1)
                height4, width4, _ = imgs[3].shape
                scale4 = max_height4 / height4
                if int(scale4 * width4) > max_width4:
                    scale4 = max_width4 / width4
                imgs[3] = cv2.resize(imgs[3],
                                    (int(width4 * scale4), int(height4 * scale4)))

                # 最后图片大小和原图一样
                final_image = np.zeros((height1, width1, 3))
                final_image[0:int(height1 * scale1),
                            0:int(width1 * scale1)] = imgs[0]
                final_image[0:int(height2 * scale2),
                            int(width1 * scale1):(int(width1 * scale1) +
                                                int(width2 * scale2))] = imgs[1]
                final_image[int(height1 * scale1):(int(height1 * scale1) +
                                                int(height3 * scale3)),
                            int(width1 * scale1):(int(width1 * scale1) +
                                                int(width3 * scale3))] = imgs[2]
                final_image[int(height1 * scale1):(int(height1 * scale1) +
                                                int(height4 * scale4)),
                            0:int(width4 * scale4)] = imgs[3]

                annots[0][:, :4] *= scale1
                annots[1][:, :4] *= scale2
                annots[2][:, :4] *= scale3
                annots[3][:, :4] *= scale4

                annots[1][:, 0] += int(width1 * scale1)
                annots[1][:, 2] += int(width1 * scale1)

                annots[2][:, 0] += int(width1 * scale1)
                annots[2][:, 2] += int(width1 * scale1)
                annots[2][:, 1] += int(height1 * scale1)
                annots[2][:, 3] += int(height1 * scale1)

                annots[3][:, 1] += int(height1 * scale1)
                annots[3][:, 3] += int(height1 * scale1)

                final_annot = np.concatenate(
                    (annots[0], annots[1], annots[2], annots[3]), axis=0)

                sample = {'img': final_image, 'annot': final_annot, 'scale': 1.}
            else:
                img = self.load_image(idx)
                annot = self.load_annotations(idx)

                sample = {'img': img, 'annot': annot, 'scale': 1.}                

        else:
            img = self.load_image(idx)
            annot = self.load_annotations(idx)

            sample = {'img': img, 'annot': annot, 'scale': 1.}

        if self.transform:
            sample = self.transform(sample)

        return sample

    def load_image(self, image_index):
        image_info = self.coco.loadImgs(self.image_ids[image_index])[0]
        path = os.path.join(self.image_root_dir, image_info['file_name'])
        img = cv2.imread(path)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

        return img.astype(np.float32) / 255.

    def load_annotations(self, image_index):
        # get ground truth annotations
        annotations_ids = self.coco.getAnnIds(
            imgIds=self.image_ids[image_index], iscrowd=False)
        annotations = np.zeros((0, 5))

        # some images appear to miss annotations
        if len(annotations_ids) == 0:
            return annotations

        # parse annotations
        coco_annotations = self.coco.loadAnns(annotations_ids)
        for _, a in enumerate(coco_annotations):
            # some annotations have basically no width / height, skip them
            if a['bbox'][2] < 1 or a['bbox'][3] < 1:
                continue

            annotation = np.zeros((1, 5))
            annotation[0, :4] = a['bbox']
            annotation[0, 4] = self.find_coco_label_from_category_id(
                a['category_id'])

            annotations = np.append(annotations, annotation, axis=0)

        # transform from [x_min, y_min, w, h] to [x_min, y_min, x_max, y_max]
        annotations[:, 2] = annotations[:, 0] + annotations[:, 2]
        annotations[:, 3] = annotations[:, 1] + annotations[:, 3]

        return annotations

    def find_coco_label_from_category_id(self, category_id):
        return self.category_id_to_coco_label[category_id]

    def find_category_id_from_coco_label(self, coco_label):
        return self.coco_label_to_category_id[coco_label]

    def num_classes(self):
        return 80

    def image_aspect_ratio(self, image_index):
        image = self.coco.loadImgs(self.image_ids[image_index])[0]
        return float(image['width']) / float(image['height'])

PAN

PAN即PANet(https://arxiv.org/pdf/2004.13824.pdf)。简单来说,就是在FPN上采样融合的特征金字塔之后,又增加了一个下采样融合的特征金字塔。出于对FLOPS的考虑,我这里使用的是原版的PAN,最后下采样融合时使用的是shortcut操作而不是Yolov4中的concat操作。
PAN代码实现如下:

class PAN(nn.Module):
    def __init__(self, planes):
        super(PAN, self).__init__()
        self.P3_down = nn.Conv2d(planes,
                                 planes,
                                 kernel_size=3,
                                 stride=2,
                                 padding=1)
        self.P4_down = nn.Conv2d(planes,
                                 planes,
                                 kernel_size=3,
                                 stride=2,
                                 padding=1)
        self.P5_down = nn.Conv2d(planes,
                                 planes,
                                 kernel_size=3,
                                 stride=2,
                                 padding=1)
        self.P6_down = nn.Conv2d(planes,
                                 planes,
                                 kernel_size=3,
                                 stride=2,
                                 padding=1)

    def forward(self, inputs):
        [P3, P4, P5, P6, P7] = inputs

        P3_downsample = self.P3_down(P3)
        P4 = P3_downsample + P4

        P4_downsample = self.P4_down(P4)
        P5 = P4_downsample + P5

        P5_downsample = self.P5_down(P5)
        P6 = P5_downsample + P6

        P6_downsample = self.P6_down(P6)
        P7 = P6_downsample + P7

        del P3_downsample, P4_downsample, P5_downsample, P6_downsample

        return [P3, P4, P5, P6, P7]

使用时,将该模块插入到FPN模块之后即可。

训练结果

训练时resize均为667,都使用了apex,都没有使用syncbn。batch均为24,使用2张2080Ti。

Network epoch5-mAP-mAR-loss epoch10-mAP-mAR-loss epoch12-mAP-mAR-loss epoch15-mAP-mAR-loss epoch20-mAP-mAR-loss epoch24-mAP-mAR-loss
ResNet50-RetinaNet-myresize667 0.253,0.361,0.61 0.287,0.398,0.51 0.293,0.401,0.49 / / /
ResNet50-RetinaNet-myresize667-mosaic 0.255,0.369,0.61 0.288,0.401,0.52 0.298,0.412,0.50 / / /
ResNet50-RetinaNet-myresize667-pan 0.253,0.362,0.61 0.289,0.404,0.51 0.297,0.411,0.49 / / /
ResNet50-RetinaNet-myresize667-pan-mosaic 0.255,0.367,0.60 0.288,0.400,0.52 0.293,0.402,0.50 0.303,0.416,0.50 0.310,0.427,0.45 0.312,0.431,0.43

对于ResNet50-RetinaNet-myresize667-fastdecode-mosaic,大目标的mAP和mAR仍然比不使用mosaic前要略低,但是因为小目标和中目标的mAP和mAR增益较大,总的mAP仍然上升了。

你可能感兴趣的:(【目标检测】mosaic与PAN)