如果你想复用detectron2的data loader,需要:
详情见下面解释。
Colab Notebook有注册数据集和使用标准格式进行训练的例子。
若detectron2要得去的数据集名字为“my_dataset”,需要构造函数返回这个数据集的函数,如下:
def get_dicts():
...
return list[dict] in the following format
然后告知detectron2这个函数:
from detectron2.data import DatasetCatalog
DatasetCatalog.register("my_dataset", get_dicts)
上面两行将数据集my_dataset
和返回数据的函数 get_dicts
联系起来了。如果不修改标准读数据和数据映射的代码,那么get_dicts
就要返回detectron2的比标准数据格式。什么样的格式呢?下面会介绍~
类似实例检测,实例/语义/全景分割,关键点检测等标准任务,我们使用COCO数据库的标注形式作为标准数据格式。
一张图片标注为一个字典,有许多可选字段,同时有部分函数能根据一些字段推断其他字段,比如,image
字段不可用时,data loader
通过file_name
读入一张图片。详细字段如下:
file_name: the full path to the image file.
#图片路径
sem_seg_file_name: the full path to the ground truth semantic segmentation file.
#语义分割GT路径
image: the image as a numpy array.
#输入图片数组
sem_seg: semantic segmentation ground truth in a 2D numpy array. Values in the array represent category labels.
#语义分割GT的2D数组
height, width: integer. The shape of image.
#图片参数,不解释
image_id (str): a string to identify this image. Mainly used during evaluation to identify the image. Each dataset may use it for different purposes.
#图片ID,不解释
annotations (list[dict]): the per-instance annotations of every instance in this image. Each annotation dict may contain:
#标注信息
bbox (list[float]): list of 4 numbers representing the bounding box of the instance.
#框
bbox_mode (int): the format of bbox. It must be a member of structures.BoxMode. Currently supports: BoxMode.XYXY_ABS, BoxMode.XYWH_ABS.
#框的形式
category_id (int): an integer in the range [0, num_categories) representing the category label. The value num_categories is reserved to represent the “background” category, if applicable.
#类别id,
segmentation (list[list[float]] or dict):
#分割信息
If list[list[float]], it represents a list of polygons, one for each connected component of the object. Each list[float] is one simple polygon in the format of [x1, y1, ..., xn, yn]. The Xs and Ys are either relative coordinates in [0, 1], or absolute coordinates, depend on whether “bbox_mode” is relative.
#若是list形式,则本参数是一个多边形列表If dict, it represents the per-pixel segmentation mask in COCO’s RLE format.
#基于像素点的maskkeypoints (list[float]): in the format of [x1, y1, v1,…, xn, yn, vn]. v[i] means the visibility of this keypoint. n must be equal to the number of keypoint categories. The Xs and Ys are either relative coordinates in [0, 1], or absolute coordinates, depend on whether “bbox_mode” is relative.
#关键点,v表示关键点是否可见,n是关键点类别,x,y的取值与bbox_mode相关
注意:COCO中的坐标点是[0, H-1/W-1]的整数。detectron2默认会在关键点坐标的绝对值上加0.5,为了顺利从不连续的坐标下标到浮点型下标。
iscrowd: 0 or 1. Whether this instance is labeled as COCO’s “crowd region”.
#实例是否是coco中的crowd region
proposal_boxes (array): 2D numpy array with shape (K, 4) representing K precomputed proposal boxes for this image.
#二维数组,K是图片即将给出的建议框个数
proposal_objectness_logits (array): numpy array with shape (K, ), which corresponds to the objectness logits of proposals in ‘proposal_boxes’.
proposal_bbox_mode (int): the format of the precomputed proposal bbox. It must be a member of structures.BoxMode. Default format is BoxMode.XYXY_ABS.
如果你的数据集是COCO格式,可如下简单使用:
from detectron2.data.datasets import register_coco_instances
register_coco_instances("my_dataset", {}, "json_annotation.json", "path/to/image/dir")
detectron2会处理包含元数据在内的所有细节。
数据集跟元数据相关,通过方法MetadataCatalog.get(dataset_name).some_metadata
来使用。元数据是包含原始数据信息的,像类别名称、颜色,文件的根目录等,这些信息可方便用于数据增强,验证模型,可视化以及日志等。元数据的数据结构取决于下面的程序会取用那些信息。
如果你要用 DatasetCatalog.register
注册一个数据集,最好用MetadataCatalog.get(dataset_name).set(name, value)
加上相应的元数据,以备后面的特征使用。以使用元数据的 thing_classes
为例,使用方法如下所示:
from detectron2.data import MetadataCatalog
MetadataCatalog.get("my_dataset").thing_classes = ["person", "dog"]
下面是detectron2中特征工程用到的元数据,如果你自己添加新的,有些特征可能用不了:
thing_classes (list[str]): Used by all instance detection/segmentation tasks. A list of names for each instance/thing category. If you load a COCO format dataset, it will be automatically set by the function load_coco_json.
#实例检测和分割使用,类别名称stuff_classes (list[str]): Used by semantic and panoptic segmentation tasks. A list of names for each stuff category.
#语义分割和全景分割使用,类别名称stuff_colors (list[tuple(r, g, b)]): Pre-defined color (in [0, 255]) for each stuff category. Used for visualization. If not given, random colors are used.
keypoint_names (list[str]): Used by keypoint localization. A list of names for each keypoint.
keypoint_flip_map (list[tuple[str]]): Used by the keypoint localization task. A list of pairs of names, where each pair are the two keypoints that should be flipped if the image is flipped during augmentation.
keypoint_connection_rules: list[tuple(str, str, (r, g, b))]. Each tuple specifies a pair of keypoints that are connected and the color to use for the line between them when visualized.
像COCO这样特定数据集的评测会有特殊的元数据:
thing_dataset_id_to_contiguous_id (dict[int->int]): Used by all instance detection/segmentation tasks in the COCO format. A mapping from instance class ids in the dataset to contiguous ids in range [0, #class). Will be automatically set by the function load_coco_json.
#COCO实例检测或分割中才用到,数据集中的类别id转换成连续的[0, #class]之间的数,函数自动设置。stuff_dataset_id_to_contiguous_id (dict[int->int]): Used when generating prediction json files for semantic/panoptic segmentation. A mapping from semantic segmentation class ids in the dataset to contiguous ids in [0, num_categories). It is useful for evaluation only.
json_file: The COCO annotation json file. Used by COCO evaluation for COCO-format datasets.
#COCO的标注文件panoptic_root, panoptic_json: Used by panoptic evaluation.
evaluator_type: Used by the builtin main training script to select evaluator. No need to use it if you write your own main script.
注意:背景中thing和stuff是不同的,可以参见文章,在detectron2中,thing用在实例水平的任务中,而stuff用在语义分割任务中,二者都用在全景分割任务中。