所有代码已上传到本人github repository:https://github.com/zgcr/pytorch-ImageNet-CIFAR-COCO-VOC-training
如果觉得有用,请点个star哟!
下列代码均在pytorch1.4版本中测试过,确认正确无误。
mosaic与PAN都是yolov4(https://arxiv.org/pdf/2004.10934.pdf)中集成的trick。mosaic是一种数据增强方式,PAN则是一种即插即用模块,使用后只会增加少量flops和params,代价很小。
mosaic用一句话概括就是把四张训练图片缩放拼成一张图。在COCO数据集中,小目标的数量占到了41.4%,但是小目标在图片上的分布不太均匀,只有52.3%的图片中有小目标,这导致在常规的训练中小目标的学习总是不太充分。使用mosaic数据增强后,在遍历每个epoch中的每张图时,额外再从训练集中随机抽取三张图,先将第一张图随机缩小,然后保持最后输出图片尺寸还是第一张图的原始尺寸,这样根据缩小后的第一张图右下角,输出图片可以划分成四个部分。可把左上部分编号1,然后顺时针编号剩下三个部分为2、3、4。缩小后的第一张图放到编号1部分,剩下三张图分别resize到部分2、3、4的尺寸。这样,同时抽取四张图的情况下,都没有小目标的可能性本身就很小了,同时,每张图都有不同程度的缩小,即使没有小目标,通过缩小,原来的目标尺寸也更接近小目标的大小,这对模型学习小目标很有利。
现在,我要在RetinaNet中应用mosaic。由于yolov4中resize并不保持图片原有的长宽比,而RetinaNet需要在resize四张图片时都保证长宽比,因此要对上面的mosaic方法做一点修改。对第一张图片,在0.2到0.8之间取一个随机数作为缩放比例(注意不能太小也不能太大,否则resize的四张图片中会有图片变得很小,模型学习不了),然后缩小第一张图片。对第2、3、4张图片,取其长、宽中的长边,除以对应部分2、3、4中的对应边(长边是长,则对应边是部分x的长,长边是宽,也对应边是部分x的宽),得到缩放比例,然后缩放第2、3、4张图片,这样可以保证缩放到的第2、3、4张图片长、宽均小于等于对应部分的长、宽。然后,对于图片2,其左上角与图片1的右上角对齐;对于图片3,其左上角与图片1的右下角对齐;对于图片4,其左上角与图片1的左下角对齐。这样我们就得到了一张由四张图拼好的训练图片了。然后,我们只要将对应图片的bbox作相应的修改,再合并起来就是新训练图片的bbox了。
需要注意的是,由于每张图都有不同程度的缩小,大目标的监督信息变少了,模型对大目标的学习可能会变得不充分。对于这个问题,有两种解决方案:第一,遍历每个epoch中的每张图时,设定一个概率0.5来觉得这张图是否要使用mosaic数据增强;第二,仍然每张图都做mosaic数据增强,多训几个epoch,直到模型对大目标的mAP和mAR达到没用mosaic数据增强前的点数为止。
对RetinaNet的mosaic代码实现如下(注意其他数据增强类不需要修改):
import os
import cv2
import torch
import numpy as np
import random
import math
from torch.utils.data import Dataset
from pycocotools.coco import COCO
import torch.nn.functional as F
class CocoDetection(Dataset):
def __init__(self,
image_root_dir,
annotation_root_dir,
set='train2017',
use_mosaic=False,
transform=None):
self.image_root_dir = image_root_dir
self.annotation_root_dir = annotation_root_dir
self.set_name = set
self.use_mosaic = use_mosaic
self.transform = transform
self.coco = COCO(
os.path.join(self.annotation_root_dir,
'instances_' + self.set_name + '.json'))
self.load_classes()
def load_classes(self):
self.image_ids = self.coco.getImgIds()
self.cat_ids = self.coco.getCatIds()
self.categories = self.coco.loadCats(self.cat_ids)
self.categories.sort(key=lambda x: x['id'])
# category_id is an original id,coco_id is set from 0 to 79
self.category_id_to_coco_label = {
category['id']: i
for i, category in enumerate(self.categories)
}
self.coco_label_to_category_id = {
v: k
for k, v in self.category_id_to_coco_label.items()
}
def __len__(self):
return len(self.image_ids)
def __getitem__(self, idx):
if self.use_mosaic:
if np.random.uniform(0, 1.)<0.5:
imgs, annots = [], []
img = self.load_image(idx)
imgs.append(img)
annot = self.load_annotations(idx)
annots.append(annot)
index_list, index = [idx], idx
for _ in range(3):
while index in index_list:
index = np.random.randint(0, len(self.image_ids))
index_list.append(index)
img = self.load_image(index)
imgs.append(img)
annot = self.load_annotations(index)
annots.append(annot)
# 第1,2,3,4张图片按顺时针方向排列,1为左上角图片,先计算出第2张图片的scale,然后推算出其他图片的最大resize尺寸,为了不让四张图片中某几张图片太小造成模型学习困难,scale限制为在0.25到0.75之间生成的随机浮点数。
scale1 = np.random.uniform(0.2, 0.8)
height1, width1, _ = imgs[0].shape
imgs[0] = cv2.resize(imgs[0],
(int(width1 * scale1), int(height1 * scale1)))
max_height2, max_width2 = int(
height1 * scale1), width1 - int(width1 * scale1)
height2, width2, _ = imgs[1].shape
scale2 = max_height2 / height2
if int(scale2 * width2) > max_width2:
scale2 = max_width2 / width2
imgs[1] = cv2.resize(imgs[1],
(int(width2 * scale2), int(height2 * scale2)))
max_height3, max_width3 = height1 - int(
height1 * scale1), width1 - int(width1 * scale1)
height3, width3, _ = imgs[2].shape
scale3 = max_height3 / height3
if int(scale3 * width3) > max_width3:
scale3 = max_width3 / width3
imgs[2] = cv2.resize(imgs[2],
(int(width3 * scale3), int(height3 * scale3)))
max_height4, max_width4 = height1 - int(height1 * scale1), int(
width1 * scale1)
height4, width4, _ = imgs[3].shape
scale4 = max_height4 / height4
if int(scale4 * width4) > max_width4:
scale4 = max_width4 / width4
imgs[3] = cv2.resize(imgs[3],
(int(width4 * scale4), int(height4 * scale4)))
# 最后图片大小和原图一样
final_image = np.zeros((height1, width1, 3))
final_image[0:int(height1 * scale1),
0:int(width1 * scale1)] = imgs[0]
final_image[0:int(height2 * scale2),
int(width1 * scale1):(int(width1 * scale1) +
int(width2 * scale2))] = imgs[1]
final_image[int(height1 * scale1):(int(height1 * scale1) +
int(height3 * scale3)),
int(width1 * scale1):(int(width1 * scale1) +
int(width3 * scale3))] = imgs[2]
final_image[int(height1 * scale1):(int(height1 * scale1) +
int(height4 * scale4)),
0:int(width4 * scale4)] = imgs[3]
annots[0][:, :4] *= scale1
annots[1][:, :4] *= scale2
annots[2][:, :4] *= scale3
annots[3][:, :4] *= scale4
annots[1][:, 0] += int(width1 * scale1)
annots[1][:, 2] += int(width1 * scale1)
annots[2][:, 0] += int(width1 * scale1)
annots[2][:, 2] += int(width1 * scale1)
annots[2][:, 1] += int(height1 * scale1)
annots[2][:, 3] += int(height1 * scale1)
annots[3][:, 1] += int(height1 * scale1)
annots[3][:, 3] += int(height1 * scale1)
final_annot = np.concatenate(
(annots[0], annots[1], annots[2], annots[3]), axis=0)
sample = {'img': final_image, 'annot': final_annot, 'scale': 1.}
else:
img = self.load_image(idx)
annot = self.load_annotations(idx)
sample = {'img': img, 'annot': annot, 'scale': 1.}
else:
img = self.load_image(idx)
annot = self.load_annotations(idx)
sample = {'img': img, 'annot': annot, 'scale': 1.}
if self.transform:
sample = self.transform(sample)
return sample
def load_image(self, image_index):
image_info = self.coco.loadImgs(self.image_ids[image_index])[0]
path = os.path.join(self.image_root_dir, image_info['file_name'])
img = cv2.imread(path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
return img.astype(np.float32) / 255.
def load_annotations(self, image_index):
# get ground truth annotations
annotations_ids = self.coco.getAnnIds(
imgIds=self.image_ids[image_index], iscrowd=False)
annotations = np.zeros((0, 5))
# some images appear to miss annotations
if len(annotations_ids) == 0:
return annotations
# parse annotations
coco_annotations = self.coco.loadAnns(annotations_ids)
for _, a in enumerate(coco_annotations):
# some annotations have basically no width / height, skip them
if a['bbox'][2] < 1 or a['bbox'][3] < 1:
continue
annotation = np.zeros((1, 5))
annotation[0, :4] = a['bbox']
annotation[0, 4] = self.find_coco_label_from_category_id(
a['category_id'])
annotations = np.append(annotations, annotation, axis=0)
# transform from [x_min, y_min, w, h] to [x_min, y_min, x_max, y_max]
annotations[:, 2] = annotations[:, 0] + annotations[:, 2]
annotations[:, 3] = annotations[:, 1] + annotations[:, 3]
return annotations
def find_coco_label_from_category_id(self, category_id):
return self.category_id_to_coco_label[category_id]
def find_category_id_from_coco_label(self, coco_label):
return self.coco_label_to_category_id[coco_label]
def num_classes(self):
return 80
def image_aspect_ratio(self, image_index):
image = self.coco.loadImgs(self.image_ids[image_index])[0]
return float(image['width']) / float(image['height'])
PAN即PANet(https://arxiv.org/pdf/2004.13824.pdf)。简单来说,就是在FPN上采样融合的特征金字塔之后,又增加了一个下采样融合的特征金字塔。出于对FLOPS的考虑,我这里使用的是原版的PAN,最后下采样融合时使用的是shortcut操作而不是Yolov4中的concat操作。
PAN代码实现如下:
class PAN(nn.Module):
def __init__(self, planes):
super(PAN, self).__init__()
self.P3_down = nn.Conv2d(planes,
planes,
kernel_size=3,
stride=2,
padding=1)
self.P4_down = nn.Conv2d(planes,
planes,
kernel_size=3,
stride=2,
padding=1)
self.P5_down = nn.Conv2d(planes,
planes,
kernel_size=3,
stride=2,
padding=1)
self.P6_down = nn.Conv2d(planes,
planes,
kernel_size=3,
stride=2,
padding=1)
def forward(self, inputs):
[P3, P4, P5, P6, P7] = inputs
P3_downsample = self.P3_down(P3)
P4 = P3_downsample + P4
P4_downsample = self.P4_down(P4)
P5 = P4_downsample + P5
P5_downsample = self.P5_down(P5)
P6 = P5_downsample + P6
P6_downsample = self.P6_down(P6)
P7 = P6_downsample + P7
del P3_downsample, P4_downsample, P5_downsample, P6_downsample
return [P3, P4, P5, P6, P7]
使用时,将该模块插入到FPN模块之后即可。
训练时resize均为667,都使用了apex,都没有使用syncbn。batch均为24,使用2张2080Ti。
Network | epoch5-mAP-mAR-loss | epoch10-mAP-mAR-loss | epoch12-mAP-mAR-loss | epoch15-mAP-mAR-loss | epoch20-mAP-mAR-loss | epoch24-mAP-mAR-loss |
---|---|---|---|---|---|---|
ResNet50-RetinaNet-myresize667 | 0.253,0.361,0.61 | 0.287,0.398,0.51 | 0.293,0.401,0.49 | / | / | / |
ResNet50-RetinaNet-myresize667-mosaic | 0.255,0.369,0.61 | 0.288,0.401,0.52 | 0.298,0.412,0.50 | / | / | / |
ResNet50-RetinaNet-myresize667-pan | 0.253,0.362,0.61 | 0.289,0.404,0.51 | 0.297,0.411,0.49 | / | / | / |
ResNet50-RetinaNet-myresize667-pan-mosaic | 0.255,0.367,0.60 | 0.288,0.400,0.52 | 0.293,0.402,0.50 | 0.303,0.416,0.50 | 0.310,0.427,0.45 | 0.312,0.431,0.43 |
对于ResNet50-RetinaNet-myresize667-fastdecode-mosaic,大目标的mAP和mAR仍然比不使用mosaic前要略低,但是因为小目标和中目标的mAP和mAR增益较大,总的mAP仍然上升了。