YOLOv3使用笔记——[CVPR2019]:Generalized Intersection over Union

关于作者,https://giou.stanford.edu/

关于论文,可以看https://zhuanlan.zhihu.com/p/57863810

论文作者提出一种新的metric,用GIoU loss来代替L1、L2损失函数,从而提升regression效果。通过修改backbone从特征提取角度提升检测性能是比较效率的方式,修改GIoU loss、IoU loss主要是从bounding box regression角度提升。

作者工程,https://github.com/generalized-iou/g-darknet

本文主要在VOC2007数据集验证下论文,训练集VOCtrainval_06-Nov-2007,测试集VOCtest_06-Nov-2007,以及在自己的数据集上试验。

1、编译、训练同darknet原工程

修改voc_label.py

import xml.etree.ElementTree as ET
import pickle
import os
from os import listdir, getcwd
from os.path import join

sets=[('2007', 'train'), ('2007', 'val'), ('2007', 'test')]

classes = ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]


def convert(size, box):
    dw = 1./(size[0])
    dh = 1./(size[1])
    x = (box[0] + box[1])/2.0 - 1
    y = (box[2] + box[3])/2.0 - 1
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x*dw
    w = w*dw
    y = y*dh
    h = h*dh
    return (x,y,w,h)

def convert_annotation(year, image_id):
    in_file = open('VOCdevkit/VOC%s/Annotations/%s.xml'%(year, image_id))
    out_file = open('VOCdevkit/VOC%s/labels/%s.txt'%(year, image_id), 'w')
    tree=ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)

    for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        cls = obj.find('name').text
        if cls not in classes or int(difficult)==1:
            continue
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
        bb = convert((w,h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')

wd = getcwd()

for year, image_set in sets:
    if not os.path.exists('VOCdevkit/VOC%s/labels/'%(year)):
        os.makedirs('VOCdevkit/VOC%s/labels/'%(year))
    image_ids = open('VOCdevkit/VOC%s/ImageSets/Main/%s.txt'%(year, image_set)).read().strip().split()
    list_file = open('%s_%s.txt'%(year, image_set), 'w')
    for image_id in image_ids:
        list_file.write('%s/VOCdevkit/VOC%s/JPEGImages/%s.jpg\n'%(wd, year, image_id))
        convert_annotation(year, image_id)
    list_file.close()

os.system("cat 2007_train.txt 2007_val.txt > train.txt")

 

修改cfg,在yolo层增加

cls_normalizer=1
iou_normalizer=0.5
iou_loss=giou

其中iou_loss为iou,giou,mse

cls_normalizer,iou_normalizer用于归一化localization and classification loss

 

2.测试mAP

python scripts/voc_map.py --data_file cfg/voc.data --cfg_file results/giou/yolov3-voc-giou.cfg --weights_path results/giou/yolov3-voc-giou_final.weights  --voc_dir scripts/VOCdevkit/ --metric giou

 

3.实验结果和结论

在voc数据集上分别使用GIoU loss、IoU loss、mse,不使用预训练模型从0训练各迭代5W次比较mAP结果。

IOU IoU-mAP GIoU-mAP mse-mAP
0.5 0.652597 0.65957 0.671661
0.55 0.626949 0.635453 0.644138
0.6 0.601179 0.603734 0.611442
0.65 0.552523 0.563104 0.559611
0.7 0.501755 0.506589 0.481827
0.75 0.41776 0.427666 0.39808
0.8 0.315096 0.323294 0.284324
0.85 0.192962 0.203799 0.16069
0.9 0.088776 0.095449 0.062133
0.95 0.028769 0.043754 0.019659

在iou>0.7下,GIoU loss、IoU loss相比mse有2~4个点的提升,简单的从0训练,未复现论文数据,但是基本符合作者的结果。后面会写一些关于从0训练的技巧。

在voc20类检测任务上,效果IoU loss>GIoU loss>mse

在coco80类检测任务上,效果GIoU loss>IoU loss>mse

在自己的数据集单类检测任务上,增加了训练技巧,效果GIoU loss≈IoU loss≈mse,相差无几。

结论,GIoU loss在多类检测上能有一定提升,但是类别较少情况,性能提升不大。

你可能感兴趣的:(YOLO)