简单总结经典目标识别数据集的特征
下载地址:http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
官方说明:The PASCALVisual Object Classes (VOC) Challenge(这里有讲图片来源、类别选择、标准过程)
The PASCAL Visual Object Classes Challenge2012 (VOC2012) Development Kit
官网:http://host.robots.ox.ac.uk/pascal/VOC/
提供一个MATLAB版本的Development Kit: http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2012/index.html
PASCAL VOC 挑战赛是视觉对象的分类识别和检测的一个基准测试,提供了检测算法和学习性能的标准图像注释数据集和标准的评估系统。从2005年至今,该组织每年都会提供一系列类别的、带标签的图片,挑战者通过设计各种精妙的算法,仅根据分析图片内容来将其分类,最终通过准确率、召回率、效率来一决高下。如今,挑战赛和其所使用的数据集已经成为了对象检测领域普遍接受的一种标准。
Classification 分类 |
For each of the classes predict the presence/absence of atleast one object of that class in a test image |
Detection 识别 |
For each of the classes predict the bounding boxes of each object of that class in a test image (if any). |
Segmentation 实例分割 |
For each pixel in a test image, predict the class of the object containing that pixel or ‘background’ if the pixel does not belong to one of the twenty specified classes. NOTE`:实例分割,不是语义分割 |
Action Classification 动作分类 |
For each of the action classes predict if a specifiedperson (indicated by their bounding box) in a test image is performingthe corresponding action. There are ten action classes:
|
Large Scale Recognition 大规模识别 |
This task is run by the ImageNet organizers.Further details can be found at their website:http://www.image-net.org/challenges/LSVRC/2012/index. |
Boxless Action Classification | For each of the action classes predict if aspecified person in a test image is performing the corresponding action.The person is indicated only by a single point lying somewhere on theirbody, rather than by a tight bounding box |
Person Layout | or each ‘person’ object in a test image (indicated bya bounding box of the person), predict the presence/absence of parts(head/hands/feet), and the bounding boxes of those parts. |
Year | Statistics | New developments | Notes |
---|---|---|---|
2005 | Only 4 classes: bicycles, cars, motorbikes, people. Train/validation/test: 1578 images containing 2209 annotated objects. | Two competitions: classification and detection | Images were largely taken from exising public datasets, and were not as challenging as the flickr images subsequently used. This dataset is obsolete.图片大部分是从现有的公共数据集中获取的,并不像随后使用的flickr图片那样具有挑战性。此数据集已过时。 |
2006 | 10 classes: bicycle, bus, car, cat, cow, dog, horse, motorbike, person, sheep. Train/validation/test: 2618 images containing 4754 annotated objects. | Images from flickr (www.flickr.com)and from Microsoft Research Cambridge (MSRC) dataset | The MSRC images were easier than flickr as the photos often concentrated on the object of interest. This dataset is obsolete.MSRC的图片比flickr容易,因为照片通常集中在感兴趣的对象上。此数据集已过时。 |
2007 | 20 classes:
|
|
This year established the 20 classes, and these have been fixed since then. This was the final year that annotation was released for the testing data.07年起设立了20个类别,从那时起就固定下来了。这是为测试数据发布注释的最后一年。 |
2008 | 20 classes. The data is split (as usual) around 50% train/val and 50% test. The train/val data has 4,340 images containing 10,363 annotated objects. |
|
|
2009 | 20 classes. The train/val data has 7,054 images containing 17,218 ROI annotated objects and 3,211 segmentations. |
|
|
2010 | 20 classes. The train/val data has 10,103 images containing 23,374 ROI annotated objects and 4,203 segmentations. |
|
|
2011 | 20 classes. The train/val data has 11,530 images containing 27,450 ROI annotated objects and 5,034 segmentations. |
|
|
2012 | 20 classes. The train/val data has 11,530 images containing 27,450 ROI annotated objects and 6,929 segmentations. |
|
|
VOC2012数据集分为20类,包括背景为21类,分别如下:
|
|
识别的评估标准,这里只关注识别的了,提交的结果存储在一个文件中, 每行的格式为:
The detection task will be judged by the precision/recall curve. The principalquantitative measure used will be the average precision (AP) (see section 3.4.1).Example code for computing the precision/recall and AP measure is providedin the development kit.Detections are considered true or false positives based on the area of overlap with ground truth bounding boxes. To be considered a correct detection, thearea of overlapaobetween the predicted bounding boxBp and ground truth bounding boxBgt must exceed 50% by the formula
检测任务将根据精度/召回曲线进行判断。使用的主要定量指标是平均精度(AP)。开发工具包中提供了计算精度/召回和AP指标的示例代码。true or false用IoU评估,大于 50%为true。
The choice of object classes, which can be considered a sub-tree of a taxonomy defined in terms of both semantic and visual similarity, also supports research in two areas which show promise in solving the scaling of object recognition to many thousands of classes: (i) exploiting visual properties common to classes e.g. vehicle wheels, for example in the form of “feature sharing” (Torralba et al.2007);(ii) exploiting external semantic information about the relations between object classes e.g. WordNet (Fellbaum1998),for example by learning a hierarchy of classifiers (Marsza-lek and Schmid2007). The availability of a class hierarchy may also prove essential in future evaluation efforts if the number of classes increases to the extent that there is implicit ambiguity in the classes, allowing individual objects to be annotated at different levels of the hierarchy e.g. hatch-back/car/vehicle. We return to this point in Sect.7.3.
对象类的选择,它可以被认为是一个分类法的子树,根据语义和视觉相似性定义,还支持两个领域的研究,这两个领域显示出解决对象识别扩展到数千个类的前景:(i)利用类的共同视觉特性,例如车辆车轮,例如以“特征共享”的形式(Torralba等人,2007年);(ii)利用关于对象类之间关系的外部语义信息,例如WordNet(Fellbaum1998),例如通过学习分类器的层次结构(Marsza lek和Schmid2007)。如果类的数量增加到类中存在隐式歧义的程度,从而允许在层次结构的不同级别(如舱口后部/车厢/车辆)对单个对象进行注释,则类层次结构的可用性在未来的评估工作中也可能被证明是必不可少的。我们回到第7.3节的这一点。
2012_004331.jpg
VOC2012
0
3
375
500
这里有个博主可以参考下: keras-yolo3之制作VOC数据集训练指南
官网:http://cocodataset.org/#home
下载:https://pjreddie.com/projects/coco-mirror/
数据集格式介绍:http://cocodataset.org/#format-data
参考:COCO数据集的标注格式-知乎专栏
COCO 数据集格式了解-CSDN博客
【学习笔记】MS COCO dataset
MS COCO数据标注详解
MS于2014年发布的Microsoft COCO数据集,已成为图像字幕的标准测试平台。COCO数据集是一个大型的、丰富的物体检测,分割和字幕数据集。MS COCO以场景理解(scene understanding)为目标,主要从复杂的日常场景中截取,图像中的目标通过精确的segmentation进行位置的标定。图像包括91类目标,328,000影像和2,500,000个label。目前为止有语义分割的最大数据集,提供的类别有80 类,有超过33 万张图片,其中20 万张有标注,整个数据集中个体的数目超过150 万个。
COCO通过使用Amazon Mechanical Turk,在Flickr上搜索80个对象类别和各种场景类型来收集图像。
COCO数据集现在有3种标注类型:object instances(目标实例), object keypoints(目标上的关键点), and image captions(看图说话),使用JSON文件存储。
标签文件标记了每个segmentation+bounding box(即分割物+分割物的边界)的精确坐标,其精度均为小数点后两位。一个目标分割物的标签示意如下:
{"segmentation":[[392.87, 275.77, 402.24, 284.2, 382.54, 342.36, 375.99, 356.43, 372.23, 357.37, 372.23, 397.7, 383.48, 419.27,407.87, 439.91, 427.57, 389.25, 447.26, 346.11, 447.26, 328.29, 468.84, 290.77,472.59, 266.38], [429.44,465.23, 453.83, 473.67, 636.73, 474.61, 636.73, 392.07, 571.07, 364.88, 546.69,363.0]],
"area": 28458.996150000003,
"iscrowd": 0,
"image_id": 503837,
"bbox": [372.23, 266.38, 264.5,208.23],
"category_id": 4,
"id": 151109
}
segmentation中记录了边缘的各个点,参考 labelme标注的数据分析
12个大类,80 个物体类别
person 人 | animal 动物
|
kitchen 厨房
|
electronic 电子产品
|
vehicle 交通工具
|
accessory 饰品
|
food 食物
|
appliance 家用电器
|
outdoor 户外
|
sports 运动
|
furniture 家具
|
indoor 室内
|
{#20个语义类别
aeroplane
bicycle
bird
boat
bottle
bus
car
cat
chair
cow
diningtable
dog
horse
motorbike
person
pottedplant
sheep
sofa
train
tvmonitor
}
{
"info" : info,
"images" : [image],
"annotations" : [annotation],
"licenses" : [license],
}
info {
"year" : int,
"version" : str,
"description" : str,
"contributor" : str,
"url" : str,
"date_created" : datetime,
}
image{
"id" : int, # 图片id
"width" : int, # 图片宽
"height" : int, # 图片高
"file_name" : str, # 图片名
"license" : int,
"flickr_url" : str,
"coco_url" : str, # 图片链接
"date_captured" : datetime, # 图片标注时间
}
license{
"id" : int,
"name" : str,
"url" : str,
}
2.5 关于自己制作MS COCO数据集
标注工具: https://github.com/wkentaro/labelme
参考 Mask-RCNN:教你如何制作自己的数据集进行像素级的目标检测
制作自己的数据集