pascal voc使用coco标准评测AP50与pasal标准评测的AP50不一致(长)

问题描述

使用Pascal的coco格式标注文件是Detectron代码提供的,下载地址为https://github.com/facebookresearch/Detectron/blob/master/detectron/datasets/data/README.md#coco-minival-annotations

使用coco评测标准测出的AP50和voc评测标准测出的AP50不一致,相差好几个点(4~5).

问题解决

修改评测代码maskrcnn_benchmark/data/datasets/evaluation/coco/cocoeval.py _prepare函数

# orgin code
# gt['ignore'] = 'iscrowd' in gt and gt['iscrowd']  
# changed code
gt['ignore'] = gt['ignore'] if 'ignore' in gt else 0
gt['ignore'] = ('iscrowd' in gt and gt['iscrowd']) or gt['ignore']  # changed by hui

问题简单分析

使用Pascal的coco格式标注文件是Detectron代码提供的,下载地址为https://github.com/facebookresearch/Detectron/blob/master/detectron/datasets/data/README.md#coco-minival-annotations
加载查看annotations字段,发现他有一个ignore字段,估计应该是pascal里的difficult字段之类的

In [2]: import json
In [3]: jd_gt = json.load(open('pascal_test2007.json'))                            
In [4]: jd_gt['annotations'][0]                                                    
Out[4]:
{'segmentation': [[47, 239, 47, 371, 195, 371, 195, 239]],
 'area': 19536,
 'iscrowd': 0,
 'image_id': 1,
 'bbox': [47, 239, 148, 132],
 'category_id': 12,
 'id': 1,
 'ignore': 0}

但是标准的coco评测代码里并没有ignroe字段,因此,即使ignore不为0,也不会被处理,但是Pascal VOC中正好有ignore不为0的数据.

print(len(jd_gt['annotations']))
ann_gt1 = [a for a in jd_gt['annotations'] if a['iscrowd']==0]
print(len(ann_gt1))
ann_gt2 = [a for a in jd_gt['annotations'] if a['ignore']==0]
print(len(ann_gt2))

Out[]:
14976
14976
12032

因此会有这个问题,即使使用ground-truth作为检测结果AP也只有80%

详细的debug过程

1. 尝试使用voc_2007_test_cocostyle中的结果作为检测结果,AP居然不是100%,而是80%

代码使用的是maskrcnn_benchmark的代码
修改任何一个model的代码(比如RetinaNet:maskrcnn_benchmark/modeling/rpn/retinanet/retinanet.py)为直接返回targets测试

def forward(self, images, features, targets=None):
    self.dug_eval_gt = True
    cls_logits = self.head(features)
    locations = self.compute_locations(cls_logits, strides=self.loc_strides)

    if self.training:
        return self._forward_train(locations, cls_logits, targets)
    else:
        if self.dug_eval_gt:  # test on ground-truth
            return eval_gt(self, locations, targets, images, cls_logits)
        return self._forward_test(self.loc_strides, cls_logits, images.image_sizes)

def eval_gt(self, locations, targets, images, cls_logits):
    targets = [t.to(locations[0].device) for t in targets]
    [t.add_field("scores", torch.ones(len(t.bbox))) for t in targets]
    res = targets, {}
    return res

修改配置文件中的TEST为 voc_2007_test_cocostyle

DATASETS:
  TRAIN: ("voc_2007_train_cocostyle", "voc_2007_val_cocostyle", "voc_2012_train_cocostyle", "voc_2012_val_cocostyle")
  TEST: ("voc_2007_test_cocostyle",)
SOLVER:
  CHECKPOINT_PERIOD: 7500
  TEST_ITER: 1 # change here to enter test mode as soon.
OUTPUT_DIR: ./outputs/pascal/gau/base_LD2.4 # base_LD1

运行代码开始测试

export NGPUS=4
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --master_port=9990 --nproc_per_node=$NGPUS tools/train_test_net.py --config configs/pascal_voc/retina_R_50_FPN_1x_voc.yaml

性能结果如下(我秀改了area的几个定义,这个可能和标准的不同,但是前面应该相同)

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.8000
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.8000
Average Precision  (AP) @[ IoU=0.60      | area=   all | maxDets=100 ] = 0.8000
Average Precision  (AP) @[ IoU=0.70      | area=   all | maxDets=100 ] = 0.8000
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.8000
Average Precision  (AP) @[ IoU=0.80      | area=   all | maxDets=100 ] = 0.8000
Average Precision  (AP) @[ IoU=0.90      | area=   all | maxDets=100 ] = 0.8000
Average Precision  (AP) @[ IoU=0.50:0.95 | area=smallest | maxDets=100 ] = 0.2998
Average Precision  (AP) @[ IoU=0.50:0.95 | area=  fpn1 | maxDets=100 ] = 0.6153
Average Precision  (AP) @[ IoU=0.50:0.95 | area=  fpn2 | maxDets=100 ] = 0.8134
Average Precision  (AP) @[ IoU=0.50:0.95 | area=  fpn3 | maxDets=100 ] = 0.9144
Average Precision  (AP) @[ IoU=0.50:0.95 | area=fpn4+5 | maxDets=100 ] = 0.9594
Average Precision  (AP) @[ IoU=0.50:0.95 | area=largest | maxDets=100 ] = -1.0000

2. 排除Datasete读取数据的问题,怀疑ignore字段的影响

前面代码处理后,会把网络检测结果(eval_gt的返回值)记录在outputs/pascal/gau/base_LD2.4/inference/voc_2007_test_cocostyle/bbox.json,进一步查看该文件

import json

det_file = "outputs/pascal/gau/base_LD2.4/inference/voc_2007_test_cocostyle/bbox.json"
gt_file =  "VOC2007/Annotations/pascal_test2007.json"
image_root = 'VOC2007/JPEGImages'
jd = json.load(open(det_file))
jd_gt = json.load(open(gt_file))

print(len(jd))
print(len(jd_gt['annotations']))

Out[]:
12032 (这里如果没有修改过dataset部分的代码,就不一定是这个数字)
14976

这说明输入这里的targets已经比gt里面的box要少了,而targets是COCODataset.__getitem__直接获得的,后面的代码不会进行删减,所以这个删减应当出现在COCODataset.__getitem__,这个函数有两个地方可能发生box的删除,第一个iscrowd和ignore(这个是我自己加的)的过滤,第二处是clip_to_image,把目标超出图片的部分clip回图片,同时remove_empty=True把w<2或h<2的box移除

len_boxes1, len_boxes2 = 0, 0
def __getitem__(self, idx):
    global len_boxes1, len_boxes2
    ......
    anno = [obj for obj in anno if obj["iscrowd"] == 0]
    # ######################### add by hui ####################################
    if anno and "ignore" in anno[0]:  # filter ignore out
        anno = [obj for obj in anno if not obj["ignore"]]
    ###########################################################################
    ......
    target = target.clip_to_image(remove_empty=True)
    ......
    return img, target, idx

gt_dataset = COCODataset(gt_file, image_root, False)
print(len(gt_dataset.coco.anns))
num_box = 0
for i in tqdm(range(len(gt_dataset))):
    img, target, idx = __getitem__(gt_dataset, i)
    num_box += len(target.bbox)
print(num_box, len_boxes1, len_boxes2)

Out[]:
14976
12032 14976 12032

经过详细排查,正式前面说到的"ignore"字段有些目标不为0, "iscrowd"所有目标都是0

print(len(jd_gt['annotations']))
ann_gt1 = [a for a in jd_gt['annotations'] if a['iscrowd']==0]
print(len(ann_gt1))
ann_gt2 = [a for a in jd_gt['annotations'] if a['ignore']==0]
print(len(ann_gt2))

Out[]:
14976
14976
12032

3. 定位问题:应当记住默认的coco评测标准没有ignroe这个关键字,只有"iscrowd"不为0的会被视为gt_ignore

但是问题是coco的评测是可以把一些目标设置成gt_ignore的,然后det按IOU匹配上它们的就是det_ignore,所有的det_ignore不参与tp和fp的计算,应当不会对结果产生影响才对,coco为不同大小目标计算AP的原理上核心一步就是把大小范围外的目标都设置为gt_ignore来实现的.
最后来到coco的评测代码maskrcnn_benchmark/data/datasets/evaluation/coco/cocoeval.py,发现它的处理是只考虑"iscrowd"字段,不会查看"ignore"字段

gt['ignore'] = 'iscrowd' in gt and gt['iscrowd']

修改为

gt['ignore'] = ('iscrowd' in gt and gt['iscrowd']) or gt['ignore']  # changed by hui

再次运行评测, OK!!!

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 1.0000
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 1.0000
Average Precision  (AP) @[ IoU=0.60      | area=   all | maxDets=100 ] = 1.0000
Average Precision  (AP) @[ IoU=0.70      | area=   all | maxDets=100 ] = 1.0000
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 1.0000
Average Precision  (AP) @[ IoU=0.80      | area=   all | maxDets=100 ] = 1.0000
Average Precision  (AP) @[ IoU=0.90      | area=   all | maxDets=100 ] = 1.0000
Average Precision  (AP) @[ IoU=0.50:0.95 | area=smallest | maxDets=100 ] = 1.0000
Average Precision  (AP) @[ IoU=0.50:0.95 | area=  fpn1 | maxDets=100 ] = 1.0000
Average Precision  (AP) @[ IoU=0.50:0.95 | area=  fpn2 | maxDets=100 ] = 1.0000
Average Precision  (AP) @[ IoU=0.50:0.95 | area=  fpn3 | maxDets=100 ] = 1.0000
Average Precision  (AP) @[ IoU=0.50:0.95 | area=fpn4+5 | maxDets=100 ] = 1.0000
Average Precision  (AP) @[ IoU=0.50:0.95 | area=largest | maxDets=100 ] = -1.0000
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.6844
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.9907
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 1.0000
Average Recall     (AR) @[ IoU=0.50:0.95 | area=smallest | maxDets=100 ] = 1.0000
Average Recall     (AR) @[ IoU=0.50:0.95 | area=  fpn1 | maxDets=100 ] = 1.0000
Average Recall     (AR) @[ IoU=0.50:0.95 | area=  fpn2 | maxDets=100 ] = 1.0000
Average Recall     (AR) @[ IoU=0.50:0.95 | area=  fpn3 | maxDets=100 ] = 1.0000
Average Recall     (AR) @[ IoU=0.50:0.95 | area=fpn4+5 | maxDets=100 ] = 1.0000
Average Recall     (AR) @[ IoU=0.50:0.95 | area=largest | maxDets=100 ] = -1.0000

Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ]不是100%是因为它只看每张图片的tok1,而很多图片中不止一个目标

其他补充材料

pacal voc标注格式,一个图片一个xml,目标box信息在object字段里


	VOC2007
	000001.jpg
	
		The VOC2007 Database
		PASCAL VOC2007
		flickr
		341012865
	
	
		Fried Camels
		Jinky the Fruit Bat
	
	
		353
		500
		3
	
	0
	
		dog
		Left
		1
		0
		
			48
			240
			195
			371
		
	
	
		person
		Left
		1
		0
		
			8
			12
			352
			498
		
	

你可能感兴趣的:(深度学习工具)