MS COCO数据集详解

一、一些网址
数据集官网首页
http://cocodataset.org/#home

数据集下载:

可用迅雷去下载官方链接,速度还是挺快的。如果速度不快,可能你需要找“正确版本”的迅雷
也可以去这个高中生搭建的下载站下载:http://bendfunction.f3322.net:666/share/。 他的首页是这样子的:http://bendfunction.f3322.net:666/
https://pjreddie.com/projects/coco-mirror/

数据集格式介绍:
http://cocodataset.org/#format-data。 本篇博文就是参照它写的。

重要的网址
在学习过程中,博主发现了写的更为详细和全面的一些介绍博客,记录在这里供大家参考:
COCO数据集的标注格式-知乎专栏
COCO 数据集格式了解-CSDN博客

二、数据集整体介绍
COCO数据集是一个大型的、丰富的物体检测,分割和字幕数据集。这个数据集以scene understanding为目标,主要从复杂的日常场景中截取,图像中的目标通过精确的segmentation进行位置的标定。图像包括91类目标,328,000影像和2,500,000个label。目前为止有语义分割的最大数据集,提供的类别有80 类,有超过33 万张图片,其中20 万张有标注,整个数据集中个体的数目超过150 万个。

MS COCO数据集包含很多的分支,截至2019年6月26日的情况如下:

2014 Train/Val: Detection 2015, Captioning 2015, Detection 2016, Keypoints 2016
2014 Testing: Captioning 2015
2015 Testing: Detection 2015, Detection 2016, Keypoints 2016
2017 Train/Val/Test: Detection 2017, Keypoints 2017, Stuff 2017, Detection 2018, Keypoints 2018, Stuff 2018, Panoptic 2018
2017 Unlabeled: [optional data for any competition]

你应该下载哪一个分支
你如果关注2017或者2018年的任务,你只需要下载2017年的图像,忽略其他的分支。

COCO 2017包含以下几个文件
MS COCO数据集详解_第1张图片

数据集格式
COCO有5种类型的标注,分别是:物体检测、关键点检测、实例分割、全景分割、图片标注,都是对应一个json文件。json是一个大字典,都包含如下的关键字:

{
	"info" : info,
	"images" : [image], 
	"annotations" : [annotation], 
	"licenses" : [license],
}

其中info对应的内容如下:

info{
	"year" : int, 
	"version" : str, 
	"description" : str, 
	"contributor" : str, 
	"url" : str, 
	"date_created" : datetime,
}

其中images对应的是一个list,对应了多张图片。list的每一个元素是一个字典,对应一张图片。格式如下:

info{
	"id" : int, 
	"width" : int, 
	"height" : int, 
	"file_name" : str, 
	"license" : int, 
	"flickr_url" : str, 
	"coco_url" : str, 
	"date_captured" : datetime,
}

license的内容如下:

license{
	"id" : int, 
	"name" : str, 
	"url" : str,
}

虽然每个json文件都有"info", “images” , “annotations”, "licenses"关键字,但不同的任务对应的json文件中annotation的形式不同,分别如下:

目标检测
Each object instance annotation contains a series of fields, including the category id and segmentation mask of the object. The segmentation format depends on whether the instance represents a single object (iscrowd=0 in which case polygons are used) or a collection of objects (iscrowd=1 in which case RLE is used). Note that a single object (iscrowd=0) may require multiple polygons, for example if occluded. Crowd annotations (iscrowd=1) are used to label large groups of objects (e.g. a crowd of people). In addition, an enclosing bounding box is provided for each object (box coordinates are measured from the top left image corner and are 0-indexed). Finally, the categories field of the annotation structure stores the mapping of category id to category and supercategory names. See also the detection task.

annotation{
	"id" : int, 
	"image_id" : int, 
	"category_id" : int, 
	"segmentation" : RLE or [polygon], 
	"area" : float, 
	"bbox" : [x,y,width,height], 
	"iscrowd" : 0 or 1,
}

categories[{
	"id" : int, 
	"name" : str, 
	"supercategory" : str,
}]

关键点检测
A keypoint annotation contains all the data of the object annotation
(including id, bbox, etc.) and two additional fields. First,
“keypoints” is a length 3k array where k is the total number of
keypoints defined for the category. Each keypoint has a 0-indexed
location x,y and a visibility flag v defined as v=0: not labeled (in
which case x=y=0), v=1: labeled but not visible, and v=2: labeled and
visible. A keypoint is considered visible if it falls inside the
object segment. “num_keypoints” indicates the number of labeled
keypoints (v>0) for a given object (many objects, e.g. crowds and
small objects, will have num_keypoints=0). Finally, for each category,
the categories struct has two additional fields: “keypoints,” which is
a length k array of keypoint names, and “skeleton”, which defines
connectivity via a list of keypoint edge pairs and is used for
visualization. Currently keypoints are only labeled for the person
category (for most medium/large non-crowd person instances). See also
the keypoint task.

annotation{
	"keypoints" : [x1,y1,v1,...], 
	"num_keypoints" : int, 
	"[cloned]" : ...,
}

categories[{
	"keypoints" : [str], 
	"skeleton" : [edge], 
	"[cloned]" : ...,
}]

"[cloned]": denotes fields copied from object detection annotations defined above.

实例分割
The stuff annotation format is identical and fully compatible to the
object detection format above (except iscrowd is unnecessary and set
to 0 by default). We provide annotations in both JSON and png format
for easier access, as well as conversion scripts between the two
formats. In the JSON format, each category present in an image is
encoded with a single RLE annotation (see the Mask API for more
details). The category_id represents the id of the current stuff
category. For more details on stuff categories and supercategories see
the stuff evaluation page. See also the stuff task.

全景分割
For the panoptic task, each annotation struct is a per-image
annotation rather than a per-object annotation. Each per-image
annotation has two parts: (1) a PNG that stores the class-agnostic
image segmentation and (2) a JSON struct that stores the semantic
information for each image segment. In more detail:

To match an annotation with an image, use the image_id field (that is
annotation.image_id == image.id). For each annotation, per-pixel
segment ids are stored as a single PNG at annotation.file_name. The
PNGs are in a folder with the same name as the JSON, i.e.,
annotations/name/ for annotations/name.json. Each segment (whether
it’s a stuff or thing segment) is assigned a unique id. Unlabeled
pixels (void) are assigned a value of 0. Note that when you load the
PNG as an RGB image, you will need to compute the ids via
ids=R+G256+B256^2. For each annotation, per-segment info is stored
in annotation.segments_info. segment_info.id stores the unique id of
the segment and is used to retrieve the corresponding mask from the
PNG (ids==segment_info.id). category_id gives the semantic category
and iscrowd indicates the segment encompasses a group of objects
(relevant for thing categories only). The bbox and area fields provide
additional info about the segment. The COCO panoptic task has the same
thing categories as the detection task, whereas the stuff categories
differ from those in the stuff task (for details see the panoptic
evaluation page). Finally, each category struct has two additional
fields: isthing that distinguishes stuff and thing categories and
color that is useful for consistent visualization.

annotation{
	"image_id" : int, 
	"file_name" : str, 
	"segments_info" : [segment_info],
}

segment_info{
"id" : int,
"category_id" : int, 
"area" : int, 
"bbox" : [x,y,width,height], 
"iscrowd" : 0 or 1,
}

categories[{
"id" : int, 
"name" : str, 
"supercategory" : str, 
"isthing" : 0 or 1, 
"color" : [R,G,B],
}]

图像标注
These annotations are used to store image captions. Each caption
describes the specified image and each image has at least 5 captions
(some images have more). See also the captioning task.

annotation{
	"id" : int, 
	"image_id" : int, 
	"caption" : str,
}

你可能感兴趣的:(神经网络,深度学习,人工智能)