目标识别数据集初识

简单总结经典目标识别数据集的特征

1 Pascal VOC(VOC 2012、VOC 2007)

下载地址:http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar

官方说明:The PASCALVisual Object Classes (VOC) Challenge(这里有讲图片来源、类别选择、标准过程)

                  The PASCAL Visual Object Classes Challenge2012 (VOC2012) Development Kit            

官网:http://host.robots.ox.ac.uk/pascal/VOC/

提供一个MATLAB版本的Development Kit: http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2012/index.html

PASCAL VOC 挑战赛是视觉对象的分类识别和检测的一个基准测试,提供了检测算法和学习性能的标准图像注释数据集和标准的评估系统。从2005年至今,该组织每年都会提供一系列类别的、带标签的图片,挑战者通过设计各种精妙的算法,仅根据分析图片内容来将其分类,最终通过准确率、召回率、效率来一决高下。如今,挑战赛和其所使用的数据集已经成为了对象检测领域普遍接受的一种标准。

Five main tasks+two “taster” tasks

Classification

分类

For each of the classes predict the presence/absence of atleast one object of that class in a test image

Detection

识别

For each of the classes predict the bounding boxes of each object of that class in a test image (if any).

Segmentation

实例分割

For each pixel in a test image, predict the class of the object containing that pixel or ‘background’ if the pixel does not belong to one of the twenty specified classes. NOTE`:实例分割,不是语义分割

Action Classification

动作分类

For each of the action classes predict if a specifiedperson (indicated by their bounding box) in a test image is performingthe corresponding action. There are ten action classes:

  • jumping; phoning; playing a musical instrument; reading; riding abicycle or motorcycle; riding a horse; running; taking a photograph;using a computer; walking

Large Scale Recognition

大规模识别

This task is run by the ImageNet organizers.Further details can be found at their website:http://www.image-net.org/challenges/LSVRC/2012/index.
Boxless Action Classification For each of the action classes predict if aspecified person in a test image is performing the corresponding action.The person is indicated only by a single point lying somewhere on theirbody, rather than by a tight bounding box
Person Layout or each ‘person’ object in a test image (indicated bya bounding box of the person), predict the presence/absence of parts(head/hands/feet), and the bounding boxes of those parts.

1.1 PASCAL VOC竞赛的发展:

Year Statistics New developments Notes
2005 Only 4 classes: bicycles, cars, motorbikes, people. Train/validation/test: 1578 images containing 2209 annotated objects. Two competitions: classification and detection Images were largely taken from exising public datasets, and were not as challenging as the flickr images subsequently used. This dataset is obsolete.图片大部分是从现有的公共数据集中获取的,并不像随后使用的flickr图片那样具有挑战性。此数据集已过时。
2006 10 classes: bicycle, bus, car, cat, cow, dog, horse, motorbike, person, sheep. Train/validation/test: 2618 images containing 4754 annotated objects. Images from flickr (www.flickr.com)and from Microsoft Research Cambridge (MSRC) dataset The MSRC images were easier than flickr as the photos often concentrated on the object of interest. This dataset is obsolete.MSRC的图片比flickr容易,因为照片通常集中在感兴趣的对象上。此数据集已过时。
2007 20 classes:
  • Person: person
  • Animal: bird, cat, cow, dog, horse, sheep
  • Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train
  • Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor
Train/validation/test: 9,963 images containing 24,640 annotated objects.
  • Number of classes increased from 10 to 20
  • Segmentation taster introduced
  • Person layout taster introduced
  • Truncation flag added to annotations
  • Evaluation measure for the classification challenge changed to Average Precision. Previously it had been ROC-AUC.
This year established the 20 classes, and these have been fixed since then. This was the final year that annotation was released for the testing data.07年起设立了20个类别,从那时起就固定下来了。这是为测试数据发布注释的最后一年。
2008 20 classes. The data is split (as usual) around 50% train/val and 50% test. The train/val data has 4,340 images containing 10,363 annotated objects.
  • Occlusion flag added to annotations
  • Test data annotation no longer made public.
  • The segmentation and person layout data sets include images from the corresponding VOC2007 sets.
 
2009 20 classes. The train/val data has 7,054 images containing 17,218 ROI annotated objects and 3,211 segmentations.
  • From now on the data for all tasks consists of the previous years' images augmented with new images. In earlier years an entirely new data set was released each year for the classification/detection tasks.
  • Augmenting allows the number of images to grow each year, and means that test results can be compared on the previous years' images.
  • Segmentation becomes a standard challenge (promoted from a taster)
  • No difficult flags were provided for the additional images (an omission).
  • Test data annotation not made public.
2010 20 classes. The train/val data has 10,103 images containing 23,374 ROI annotated objects and 4,203 segmentations.
  • Action Classification taster introduced.
  • Associated challenge on large scale classification introduced based on ImageNet.
  • Amazon Mechanical Turk used for early stages of the annotation.
  • Method of computing AP changed. Now uses all data points rather than TREC style sampling.
  • Test data annotation not made public.
2011 20 classes. The train/val data has 11,530 images containing 27,450 ROI annotated objects and 5,034 segmentations.
  • Action Classification taster extended to 10 classes + "other".
  • Layout annotation is now not "complete": only people are annotated and some people may be unannotated.
2012 20 classes. The train/val data has 11,530 images containing 27,450 ROI annotated objects and 6,929 segmentations.
  • Size of segmentation dataset substantially increased.
  • People in action classification dataset are additionally annotated with a reference point on the body.
  • Datasets for classification, detection and person layout are the same as VOC2011.

1.2 类别

VOC2012数据集分为20类,包括背景为21类,分别如下: 

                                  目标识别数据集初识_第1张图片

  • Person: person
  • Animal: bird, cat, cow, dog, horse, sheep
  • Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train
  • Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor
  • 人:人
  • 动物:鸟、猫、牛、狗、马、羊
  • 车辆:飞机、自行车、船、巴士、汽车、摩托车、火车
  • 室内:瓶、椅子、餐桌、盆栽植物、沙发、电视/监视器

1.3 评估

识别的评估标准,这里只关注识别的了,提交的结果存储在一个文件中, 每行的格式为:

     

The detection task will be judged by the precision/recall curve. The principalquantitative measure used will be the average precision (AP) (see section 3.4.1).Example code for computing the precision/recall and AP measure is providedin the development kit.Detections are considered true or false positives based on the area of overlap with ground truth bounding boxes. To be considered a correct detection, thearea of overlapaobetween the predicted bounding boxBp and ground truth bounding boxBgt must exceed 50% by the formula

                                    

检测任务将根据精度/召回曲线进行判断。使用的主要定量指标是平均精度(AP)。开发工具包中提供了计算精度/召回和AP指标的示例代码。true or false用IoU评估,大于 50%为true。

1.4 类别的选择

The  choice  of  object  classes,  which  can  be  considered a sub-tree of a taxonomy defined in terms of both semantic and visual similarity, also supports research in two areas which show promise in solving the scaling of object recognition  to  many  thousands  of  classes:  (i)  exploiting  visual properties common to classes e.g. vehicle wheels, for example in the form of “feature sharing” (Torralba et al.2007);(ii) exploiting external semantic information about the relations between object classes e.g. WordNet (Fellbaum1998),for example by learning a hierarchy of classifiers (Marsza-lek and Schmid2007). The availability of a class hierarchy may also prove essential in future evaluation efforts if the number of classes increases to the extent that there is implicit ambiguity in the classes, allowing individual objects to be annotated at different levels of the hierarchy e.g. hatch-back/car/vehicle. We return to this point in Sect.7.3.

对象类的选择,它可以被认为是一个分类法的子树,根据语义和视觉相似性定义,还支持两个领域的研究,这两个领域显示出解决对象识别扩展到数千个类的前景:(i)利用类的共同视觉特性,例如车辆车轮,例如以“特征共享”的形式(Torralba等人,2007年);(ii)利用关于对象类之间关系的外部语义信息,例如WordNet(Fellbaum1998),例如通过学习分类器的层次结构(Marsza lek和Schmid2007)。如果类的数量增加到类中存在隐式歧义的程度,从而允许在层次结构的不同级别(如舱口后部/车厢/车辆)对单个对象进行注释,则类层次结构的可用性在未来的评估工作中也可能被证明是必不可少的。我们回到第7.3节的这一点。

                            目标识别数据集初识_第2张图片

1.5 标注例子


	2012_004331.jpg
	VOC2012
	
		person
		
			1
			0
			0
			0
			0
			0
			0
			0
			0
			0
			0
		
		
			208
			102
			230
			25
		
		0
		Unspecified
		
			155
			119
		
	
	0
	
		3
		375
		500
	
	
		PASCAL VOC2012
		The VOC2012 Database
		flickr
	

1.6 关于自己制作VOC格式的数据集

这里有个博主可以参考下: keras-yolo3之制作VOC数据集训练指南

2 MS COCO

官网:http://cocodataset.org/#home

下载:https://pjreddie.com/projects/coco-mirror/

数据集格式介绍:http://cocodataset.org/#format-data

参考:COCO数据集的标注格式-知乎专栏
           COCO 数据集格式了解-CSDN博客

          【学习笔记】MS COCO dataset

           MS COCO数据标注详解

MS于2014年发布的Microsoft COCO数据集,已成为图像字幕的标准测试平台。COCO数据集是一个大型的、丰富的物体检测,分割和字幕数据集。MS COCO以场景理解(scene understanding)为目标,主要从复杂的日常场景中截取,图像中的目标通过精确的segmentation进行位置的标定。图像包括91类目标,328,000影像和2,500,000个label。目前为止有语义分割的最大数据集,提供的类别有80 类,有超过33 万张图片,其中20 万张有标注,整个数据集中个体的数目超过150 万个。

2.1 图像来源

COCO通过使用Amazon Mechanical Turk,在Flickr上搜索80个对象类别和各种场景类型来收集图像。

2.2 标注类型

COCO数据集现在有3种标注类型:object instances(目标实例), object keypoints(目标上的关键点), and image captions(看图说话),使用JSON文件存储。

标签文件标记了每个segmentation+bounding box(即分割物+分割物的边界)的精确坐标,其精度均为小数点后两位。一个目标分割物的标签示意如下:

{"segmentation":[[392.87, 275.77, 402.24, 284.2, 382.54, 342.36, 375.99, 356.43, 372.23, 357.37, 372.23, 397.7, 383.48, 419.27,407.87, 439.91, 427.57, 389.25, 447.26, 346.11, 447.26, 328.29, 468.84, 290.77,472.59, 266.38], [429.44,465.23, 453.83, 473.67, 636.73, 474.61, 636.73, 392.07, 571.07, 364.88, 546.69,363.0]], 
"area": 28458.996150000003, 
"iscrowd": 0,
"image_id": 503837, 
"bbox": [372.23, 266.38, 264.5,208.23], 
"category_id": 4, 
"id": 151109
}

 segmentation中记录了边缘的各个点,参考 labelme标注的数据分析

目标识别数据集初识_第3张图片 

2.3 类别

12个大类,80 个物体类别

person 人 animal  动物
  • bird
  • cat
  • dog
  • horse
  • sheep
  • cow
  • elephant
  • bear
  • zebra
  • giraffe
kitchen 厨房
  • bottle
  • wine glass
  • cup
  • fork
  • knife
  • spoon
  • bowl
electronic 电子产品
  • tv
  • laptop
  • mouse
  • remote
  • keyboard
  • cell phone
vehicle 交通工具
  •  bicycle
  • car
  • motorcycle
  • airplane
  • bus
  • train
  • truck
  • boat
accessory 饰品
  • backpack
  • umbrella
  • handbag
  • tie
  • suitcase
food  食物
  • banana
  • apple
  • sandwich
  • orange
  • broccoli
  • carrot
  • hot dog
  • pizza
  • donut
  • cake
appliance 家用电器
  • microwave
  • oven
  • toaster
  • sink
  •  refrigerator
outdoor  户外
  • traffic light
  • fire hydrant
  •  stop sign
  • parking meter
  • bench
sports  运动
  • frisbee
  • skis
  • snowboard
  • sports ball
  • kite
  • baseball bat
  • baseball glove
  • skateboard
  • surfboard
  • tennis racket
furniture 家具
  • chair
  • couch
  • potted plant
  • bed
  • dining table
  • toilet
indoor 室内
  • book
  • clock
  • vase
  • scissors
  • teddy bear
  • hair drier
  • toothbrush

 

{#20个语义类别
    aeroplane
    bicycle
    bird
    boat
    bottle
    bus
    car
    cat
    chair
    cow
    diningtable
    dog
    horse
    motorbike
    person
    pottedplant
    sheep
    sofa
    train
    tvmonitor
}

2.4 标注格式

{
    "info" : info, 
    "images" : [image],
    "annotations" : [annotation],
    "licenses" : [license],
}

info {
    "year" : int,
    "version" : str,
    "description" : str,
    "contributor" : str,
    "url" : str,
    "date_created" : datetime,
}

image{
    "id" : int, # 图片id
    "width" : int, # 图片宽
    "height" : int, # 图片高
    "file_name" : str, # 图片名
    "license" : int,
    "flickr_url" : str,
    "coco_url" : str, # 图片链接
    "date_captured" : datetime, # 图片标注时间
}

license{
    "id" : int,
    "name" : str,
    "url" : str,
}

2.5 关于自己制作MS COCO数据集

标注工具: https://github.com/wkentaro/labelme

参考 Mask-RCNN:教你如何制作自己的数据集进行像素级的目标检测

       制作自己的数据集

 

你可能感兴趣的:(目标识别数据集初识)