PASCAL VOC 增强版语义分割数据集制作 (For PyTorch)

1. 数据集简介

PASCAL VOC 增强版语义分割数据集包括 PASCAL VOC 2012 数据集和 Semantic Boundaries Dataset 两部分。SBD 数据集包含来自 PASCAL VOC 2011 数据集的11355张图片的注释,标签文件为 .mat 格式,类别与 PASCAL VOC 一致:

  • person
  • bird, cat, cow, dog, horse, sheep
  • aeroplane, bicycle, boat, bus, car, motorbike, train
  • bottle, chair, dining table, potted plant, sofa, tv/monitor

PASCAL VOC 2012 数据集文件目录结构:

  • Annotations:包含xml文件,其中有检测、分类等任务的标签
  • ImageSets:定义了训练集、验证集与测试集的划分
  • JPEGImages:原始图像
  • SegmentationClass:语义分割的标签 (RGB)
  • SegmentationObject:实例分割的标签 (RGB)

Semantic Boundaries Dataset 文件目录结构:

  • img:原始图像
  • cls:语义分割的标签 (.mat)
  • inst:实例分割的标签 (.mat)
  • train.txt:包含 8498 个用于训练的图像索引
  • val.txt:包含 2857 个用于验证的图像索引

PS: 此处主要介绍语义分割部分。

2. 数据集的下载

  • PASCAL VOC 2012 数据集的下载地址:http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html
    下载得到的语义分割标签为 RGB 图像,需要额外将其转换为灰度图像。

  • Semantic Boundaries Dataset的下载地址:
    http://home.bharathh.info/pubs/codes/SBD/download.html
    下载得到的分割标签为 .mat 格式,需要将其转换为与 PASCAL VOC 格式相同的灰度图像。此外由于其所需的原始图像均包含在 PASCAL VOC 2012 中,所以仅需要其标签部分。

数据集下载完成后解压如下:

"""
VOCdevkit
    ├─VOC2012
    |   ├─Annotations
    |   ├─ImageSets
    |   ├─JPEGImages
    |   ├─SegmentationClass
    |   ├─SegmentationObject
    |   └─SemanticBoundaries
    |       ├─cls
    |       ├─inst
    |       ├─train.txt
    |       └─val.txt
    └─generate_aug_data.py
"""

3. 数据集的标签

labels = [
    #           class name            id    trainId         color
    Label(  'background'            ,  0 ,        0 , (   0,   0,   0) ),
    Label(  'aeroplane'             ,  1 ,        1 , ( 128,   0,   0) ),
    Label(  'bicycle'               ,  2 ,        2 , (   0, 128,   0) ),
    Label(  'bird'                  ,  3 ,        3 , ( 128, 128,   0) ),
    Label(  'boat'                  ,  4 ,        4 , (   0,   0, 128) ),
    Label(  'bottle'                ,  5 ,        5 , ( 128,   0, 128) ),
    Label(  'bus'                   ,  6 ,        6 , (   0, 128, 128) ),
    Label(  'car'                   ,  7 ,        7 , ( 128, 128, 128) ),
    Label(  'cat'                   ,  8 ,        8 , (  64,   0,   0) ),
    Label(  'chair'                 ,  9 ,        9 , ( 192,   0,   0) ),
    Label(  'cow'                   , 10 ,       10 , (  64, 128,   0) ),
    Label(  'dining table'          , 11 ,       11 , ( 192, 128,   0) ),
    Label(  'dog'                   , 12 ,       12 , (  64,   0, 128) ),
    Label(  'horse'                 , 13 ,       13 , ( 192,   0, 128) ),
    Label(  'motorbike'             , 14 ,       14 , (  64, 128, 128) ),
    Label(  'person'                , 15 ,       15 , ( 192, 128, 128) ),
    Label(  'potted plant'          , 16 ,       16 , (   0,  64,   0) ),
    Label(  'sheep'                 , 17 ,       17 , ( 128,  64,   0) ),
    Label(  'sofa'                  , 18 ,       18 , (   0, 192,   0) ),
    Label(  'train'                 , 19 ,       19 , ( 128, 192,   0) ),
    Label(  'tv monitor'            , 20 ,       20 , (   0,  64, 128) ),
    Label(  'bordering region'      , 255,       21 , ( 224, 224, 192) ),
]

PS: PASCAL VOC 分割数据集中将物体的边界区域标记为 bordering region,表示这些区域可以是任何类别,在计算精度时将忽略该部分像素。Semantic Boundaries Dataset 中不含 bordering region 部分。

4. 数据集生成

PS: 训练集、验证集和测试集的划分参照 deeplab,即:
train = (sbd_train | sbd_val | voc_train) - voc_val,验证集与测试集同 PASCAL VOC 2012
所生成的data list文件格式为:
2007_000032
2007_000039
2007_000063
2007_000068
2007_000121
2007_000170
...
image在JPEGImages目录下,2007_000032.jpg
mask在SegmentationClassAug目录下,2007_000032_trainIds.png

import os
import sys
import re
import shutil
import numpy as np
from PIL import Image
import scipy.io
from collections import namedtuple


Label = namedtuple( 'Label' , [
    'name'        , # The identifier of this label, e.g. 'car', 'person', ... .
                    # We use them to uniquely name a class

    'id'          , # An integer ID that is associated with this label.
                    # The IDs are used to represent the label in ground truth images
                    # An ID of -1 means that this label does not have an ID and thus
                    # is ignored when creating ground truth images (e.g. license plate).
                    # Do not modify these IDs, since exactly these IDs are expected by the
                    # evaluation server.

    'trainId'     , # Feel free to modify these IDs as suitable for your method. Then create
                    # ground truth images with train IDs, using the tools provided in the
                    # 'preparation' folder. However, make sure to validate or submit results
                    # to our evaluation server using the regular IDs above!
                    # For trainIds, multiple labels might have the same ID. Then, these labels
                    # are mapped to the same class in the ground truth images. For the inverse
                    # mapping, we use the label that is defined first in the list below.
                    # For example, mapping all void-type classes to the same ID in training,
                    # might make sense for some approaches.
                    # Max value is 255!

    'color'       , # The color of this label
    ] )
labels = [
    #       name                     id    trainId   color
    Label(  'background'            ,  0 ,        0 , (   0,   0,   0) ),
    Label(  'aeroplane'             ,  1 ,        1 , ( 128,   0,   0) ),
    Label(  'bicycle'               ,  2 ,        2 , (   0, 128,   0) ),
    Label(  'bird'                  ,  3 ,        3 , ( 128, 128,   0) ),
    Label(  'boat'                  ,  4 ,        4 , (   0,   0, 128) ),
    Label(  'bottle'                ,  5 ,        5 , ( 128,   0, 128) ),
    Label(  'bus'                   ,  6 ,        6 , (   0, 128, 128) ),
    Label(  'car'                   ,  7 ,        7 , ( 128, 128, 128) ),
    Label(  'cat'                   ,  8 ,        8 , (  64,   0,   0) ),
    Label(  'chair'                 ,  9 ,        9 , ( 192,   0,   0) ),
    Label(  'cow'                   , 10 ,       10 , (  64, 128,   0) ),
    Label(  'dining table'          , 11 ,       11 , ( 192, 128,   0) ),
    Label(  'dog'                   , 12 ,       12 , (  64,   0, 128) ),
    Label(  'horse'                 , 13 ,       13 , ( 192,   0, 128) ),
    Label(  'motorbike'             , 14 ,       14 , (  64, 128, 128) ),
    Label(  'person'                , 15 ,       15 , ( 192, 128, 128) ),
    Label(  'potted plant'          , 16 ,       16 , (   0,  64,   0) ),
    Label(  'sheep'                 , 17 ,       17 , ( 128,  64,   0) ),
    Label(  'sofa'                  , 18 ,       18 , (   0, 192,   0) ),
    Label(  'train'                 , 19 ,       19 , ( 128, 192,   0) ),
    Label(  'tv monitor'            , 20 ,       20 , (   0,  64, 128) ),
    Label(  'bordering region'      , 255,       21 , ( 224, 224, 192) ),
]


####################################################################################
num_classes = 22
unspecified_id = num_classes - 1
train_id = list()
valid_labels = dict()
color_palette = list()
id_key = list()
id_mapping = list()
for label in labels:
    train_id.append(label.trainId)
    valid_labels[label.name] = label.id
    color_palette += list(label.color)
    # encoder: r<<16 + g<<8 + b
    id_key.append(label.trainId)
    encoder = (label.color[0] << 16) + (label.color[1] << 8) + label.color[2]
    id_mapping.append(encoder)
assert list(train_id) == sorted(train_id) and len(train_id) == num_classes
assert len(color_palette) == (num_classes * 3)
temp = list(zip(id_mapping, id_key))
temp.sort()
temp = list(zip(*temp))
id_key = np.array(temp[1], dtype='int')
id_mapping = np.array(temp[0], dtype='int')
print('valid class: ', valid_labels)
print('train_id: ', train_id)
print('unspecified_id: ', unspecified_id)
print('color_palette: ', color_palette)
print('id_key: ', id_key)
print('id_mapping: ', id_mapping)
"""
valid class:  {'background': 0, 'aeroplane': 1, 'bicycle': 2, 'bird': 3, 'boat': 4, 'bottle': 5, 'bus': 6, 'car': 7, 'cat': 8, 'chair': 9, 'cow': 10, 'dining table': 11, 'dog': 12, 'horse': 13, 'motorbike': 14, 'person': 15, 'potted plant': 16, 'sheep': 17, 'sofa': 18, 'train': 19, 'tv monitor': 20, 'bordering region': 255}
train_id:  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]
unspecified_id:  21
color_palette:  [0, 0, 0, 128, 0, 0, 0, 128, 0, 128, 128, 0, 0, 0, 128, 128, 0, 128, 0, 128, 128, 128, 128, 128, 64, 0, 0, 192, 0, 0, 64, 128, 0, 192, 128, 0, 64, 0, 128, 192, 0, 128, 64, 128, 128, 192, 128, 128, 0, 64, 0, 128, 64, 0, 0, 192, 0, 128, 192, 0, 0, 64, 128, 224, 224, 192]
id_key:  [ 0  4 16 20  2  6 18  8 12 10 14  1  5 17  3  7 19  9 13 11 15 21]
id_mapping:  [       0      128    16384    16512    32768    32896    49152  4194304
  4194432  4227072  4227200  8388608  8388736  8404992  8421376  8421504
  8437760 12582912 12583040 12615680 12615808 14737600]
"""


####################################################################################
# Path of PASCAL VOC 2012 Dataset + Semantic Boundaries Dataset
"""
VOCdevkit
    ├─VOC2012
    |   ├─ImageSets
    |   ├─JPEGImages
    |   ├─SegmentationClass
    |   └─SemanticBoundaries
    |       ├─cls
    |       ├─img
    |       └─inst
    └─generate_aug_data.py

data_list_file:
img_name0
img_name1
img_name2
...
"""
data_dir = os.path.abspath(os.path.dirname(__file__))
img_dir = os.path.join(data_dir, 'VOC2012/JPEGImages')
mat_dir = os.path.join(data_dir, 'VOC2012/SemanticBoundaries/cls')
voc_img_sets_dir = os.path.join(data_dir, 'VOC2012/ImageSets/Segmentation')
sbd_img_sets_dir = os.path.join(data_dir, 'VOC2012/SemanticBoundaries')
voc_mask_dir = os.path.join(data_dir, 'VOC2012/SegmentationClass')
aug_mask_dir = os.path.join(data_dir, 'VOC2012/SegmentationClassAug')
if not os.path.exists(aug_mask_dir):
    os.mkdir(aug_mask_dir)


####################################################################################
# convert .mat to .png
print()
i = 0
for mat_file in os.listdir(mat_dir):
    match = re.match(r'^(\d+_\d+).mat$', mat_file)
    if match:
        img = match.groups()[0]
        mat = scipy.io.loadmat(os.path.join(mat_dir, mat_file), mat_dtype=True, squeeze_me=True, struct_as_record=False)
        assert np.max(mat['GTcls'].Segmentation) < unspecified_id # no bordering region
        mask = Image.fromarray(mat['GTcls'].Segmentation)
        mask.save(os.path.join(aug_mask_dir, img + '_trainIds.png'))
        mask.putpalette(color_palette)
        mask.save(os.path.join(aug_mask_dir, img + '.png'))
        i += 1
        print('\rConverting .mat to .png: %d' % i, end='')
        sys.stdout.flush()

# copy voc to aug
print()
i = 0
for mask_file in os.listdir(voc_mask_dir):
    match = re.match(r'^(\d+_\d+).png$', mask_file)
    if match:
        img = match.groups()[0]
        # copy voc to aug
        shutil.copyfile(os.path.join(voc_mask_dir, mask_file), os.path.join(aug_mask_dir, mask_file))
        mask = np.array(Image.open(os.path.join(aug_mask_dir, mask_file)).convert('RGB'), dtype=np.uint32)
        # encoder: r<<16 + g<<8 + b
        encoder = np.left_shift(mask[:, :, 0], 16) + np.left_shift(mask[:, :, 1], 8) + mask[:, :, 2]
        index = np.digitize(encoder.ravel(), id_mapping, right=True)
        new_mask = id_key[index].reshape(encoder.shape).astype('uint8')
        new_mask = Image.fromarray(new_mask)
        new_mask.save(os.path.join(aug_mask_dir, img + '_trainIds.png'))
        i += 1
        print('\rCopying voc to aug: %d' % i, end=' ')
        sys.stdout.flush()


####################################################################################
print()
with open(os.path.join(voc_img_sets_dir, 'train.txt')) as f:
    img_sets = f.readlines()
    voc_train = set([i.split()[0] for i in img_sets])
    assert len(img_sets) == len(voc_train)
with open(os.path.join(voc_img_sets_dir, 'val.txt')) as f:
    img_sets = f.readlines()
    voc_val = set([i.split()[0] for i in img_sets])
    assert len(img_sets) == len(voc_val)
with open(os.path.join(sbd_img_sets_dir, 'train.txt')) as f:
    img_sets = f.readlines()
    sbd_train = set([i.split()[0] for i in img_sets])
    assert len(img_sets) == len(sbd_train)
with open(os.path.join(sbd_img_sets_dir, 'val.txt')) as f:
    img_sets = f.readlines()
    sbd_val = set([i.split()[0] for i in img_sets])
    assert len(img_sets) == len(sbd_val)

aug_train = (sbd_train | sbd_val | voc_train) - voc_val
aug_trainval = aug_train | voc_val
# check
for item in aug_trainval:
    img = os.path.join(img_dir, item + '.jpg')
    mask = os.path.join(aug_mask_dir, item + '_trainIds.png')
    assert os.path.exists(img) and os.path.exists(mask)

# create data list
with open(os.path.join(data_dir, 'train_aug.txt'), 'w') as train:
    for line in aug_train:
        train.write(str(line) + '\n')
    print('Created train data list ({}) in {}.'.format(len(aug_train), data_dir))
with open(os.path.join(data_dir, 'trainval_aug.txt'), 'w') as trainval:
    for line in aug_trainval:
        trainval.write(str(line) + '\n')
    print('Created trainval data list ({}) in {}.'.format(len(aug_trainval), data_dir))


####################################################################################
# compute class weights
print()
class_count = np.zeros(num_classes, dtype='int64')
# Get the total number of pixels in all train masks for each class
for i, img in enumerate(aug_train, 1):
    mask = np.array(Image.open(os.path.join(aug_mask_dir, img + '_trainIds.png')))
    class_count += np.histogram(mask, bins=np.arange(num_classes + 1))[0]
    print('\rComputing class weight: %d' % i, end=' ')
    sys.stdout.flush()

# including unspecified_id
class_p_unspecified = class_count / np.sum(class_count.astype(np.int64))
class_weight_unspecified = 1 / np.log(1.02 + class_p_unspecified)
# excluding unspecified_id
class_p = class_count[:-1] / np.sum(class_count[:-1].astype(np.int64))
class_weight = 1 / np.log(1.02 + class_p)

def array2string(array, format='%.6f'):
    return ', '.join([format % i for i in array])

print()
with open(os.path.join(data_dir, 'args_aug.txt'), 'w') as f:
    # valid_labels
    f.writelines('valid class:\n')
    f.writelines('{}\n\n'.format(valid_labels))
    # unspecified_id
    f.writelines('unspecified_id: {}\n\n'.format(unspecified_id))
    # train_id
    f.writelines('train_id:\n')
    f.writelines(array2string(train_id, '%d') + '\n\n')
    # class_count
    f.writelines('pixel counts for each class:\n')
    f.writelines(array2string(class_count, '%d') + '\n\n')
    # class_p_unspecified
    f.writelines('class probability including unspecified_id:\n')
    f.writelines(array2string(class_p_unspecified) + '\n\n')
    # class_weight_unspecified
    f.writelines('class weight including unspecified_id:\n')
    f.writelines(array2string(class_weight_unspecified) + '\n\n')
    # class_p
    f.writelines('class probability excluding unspecified_id:\n')
    f.writelines(array2string(class_p) + '\n\n')
    # class_weight
    f.writelines('class weight excluding unspecified_id:\n')
    f.writelines(array2string(class_weight) + '\n\n')
print('Generated class weight in {}.'.format(os.path.join(data_dir, 'args.txt')))

你可能感兴趣的:(PASCAL VOC 增强版语义分割数据集制作 (For PyTorch))