PASCAL VOC 增强版语义分割数据集制作 (For PyTorch)

1. 数据集简介

PASCAL VOC 增强版语义分割数据集包括 PASCAL VOC 2012 数据集和 Semantic Boundaries Dataset 两部分。SBD 数据集包含来自 PASCAL VOC 2011 数据集的11355张图片的注释，标签文件为 .mat 格式，类别与 PASCAL VOC 一致：

person
bird, cat, cow, dog, horse, sheep
aeroplane, bicycle, boat, bus, car, motorbike, train
bottle, chair, dining table, potted plant, sofa, tv/monitor

PASCAL VOC 2012 数据集文件目录结构：

Annotations：包含xml文件，其中有检测、分类等任务的标签
ImageSets：定义了训练集、验证集与测试集的划分
JPEGImages：原始图像
SegmentationClass：语义分割的标签 (RGB)
SegmentationObject：实例分割的标签 (RGB)

Semantic Boundaries Dataset 文件目录结构：

img：原始图像
cls：语义分割的标签 (.mat)
inst：实例分割的标签 (.mat)
train.txt：包含 8498 个用于训练的图像索引
val.txt：包含 2857 个用于验证的图像索引

PS: 此处主要介绍语义分割部分。

2. 数据集的下载

PASCAL VOC 2012 数据集的下载地址：http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html
下载得到的语义分割标签为 RGB 图像，需要额外将其转换为灰度图像。
Semantic Boundaries Dataset的下载地址：
http://home.bharathh.info/pubs/codes/SBD/download.html
下载得到的分割标签为 .mat 格式，需要将其转换为与 PASCAL VOC 格式相同的灰度图像。此外由于其所需的原始图像均包含在 PASCAL VOC 2012 中，所以仅需要其标签部分。

数据集下载完成后解压如下：

"""
VOCdevkit
    ├─VOC2012
    |   ├─Annotations
    |   ├─ImageSets
    |   ├─JPEGImages
    |   ├─SegmentationClass
    |   ├─SegmentationObject
    |   └─SemanticBoundaries
    |       ├─cls
    |       ├─inst
    |       ├─train.txt
    |       └─val.txt
    └─generate_aug_data.py
"""

3. 数据集的标签

labels = [
    #           class name            id    trainId         color
    Label(  'background'            ,  0 ,        0 , (   0,   0,   0) ),
    Label(  'aeroplane'             ,  1 ,        1 , ( 128,   0,   0) ),
    Label(  'bicycle'               ,  2 ,        2 , (   0, 128,   0) ),
    Label(  'bird'                  ,  3 ,        3 , ( 128, 128,   0) ),
    Label(  'boat'                  ,  4 ,        4 , (   0,   0, 128) ),
    Label(  'bottle'                ,  5 ,        5 , ( 128,   0, 128) ),
    Label(  'bus'                   ,  6 ,        6 , (   0, 128, 128) ),
    Label(  'car'                   ,  7 ,        7 , ( 128, 128, 128) ),
    Label(  'cat'                   ,  8 ,        8 , (  64,   0,   0) ),
    Label(  'chair'                 ,  9 ,        9 , ( 192,   0,   0) ),
    Label(  'cow'                   , 10 ,       10 , (  64, 128,   0) ),
    Label(  'dining table'          , 11 ,       11 , ( 192, 128,   0) ),
    Label(  'dog'                   , 12 ,       12 , (  64,   0, 128) ),
    Label(  'horse'                 , 13 ,       13 , ( 192,   0, 128) ),
    Label(  'motorbike'             , 14 ,       14 , (  64, 128, 128) ),
    Label(  'person'                , 15 ,       15 , ( 192, 128, 128) ),
    Label(  'potted plant'          , 16 ,       16 , (   0,  64,   0) ),
    Label(  'sheep'                 , 17 ,       17 , ( 128,  64,   0) ),
    Label(  'sofa'                  , 18 ,       18 , (   0, 192,   0) ),
    Label(  'train'                 , 19 ,       19 , ( 128, 192,   0) ),
    Label(  'tv monitor'            , 20 ,       20 , (   0,  64, 128) ),
    Label(  'bordering region'      , 255,       21 , ( 224, 224, 192) ),
]

PS: PASCAL VOC 分割数据集中将物体的边界区域标记为 bordering region，表示这些区域可以是任何类别，在计算精度时将忽略该部分像素。Semantic Boundaries Dataset 中不含 bordering region 部分。

4. 数据集生成

PS: 训练集、验证集和测试集的划分参照 deeplab，即：
train = (sbd_train | sbd_val | voc_train) - voc_val，验证集与测试集同 PASCAL VOC 2012
所生成的data list文件格式为：
2007_000032
2007_000039
2007_000063
2007_000068
2007_000121
2007_000170
...
image在JPEGImages目录下，2007_000032.jpg
mask在SegmentationClassAug目录下，2007_000032_trainIds.png

import os
import sys
import re
import shutil
import numpy as np
from PIL import Image
import scipy.io
from collections import namedtuple


Label = namedtuple( 'Label' , [
    'name'        , # The identifier of this label, e.g. 'car', 'person', ... .
                    # We use them to uniquely name a class

    'id'          , # An integer ID that is associated with this label.
                    # The IDs are used to represent the label in ground truth images
                    # An ID of -1 means that this label does not have an ID and thus
                    # is ignored when creating ground truth images (e.g. license plate).
                    # Do not modify these IDs, since exactly these IDs are expected by the
                    # evaluation server.

    'trainId'     , # Feel free to modify these IDs as suitable for your method. Then create
                    # ground truth images with train IDs, using the tools provided in the
                    # 'preparation' folder. However, make sure to validate or submit results
                    # to our evaluation server using the regular IDs above!
                    # For trainIds, multiple labels might have the same ID. Then, these labels
                    # are mapped to the same class in the ground truth images. For the inverse
                    # mapping, we use the label that is defined first in the list below.
                    # For example, mapping all void-type classes to the same ID in training,
                    # might make sense for some approaches.
                    # Max value is 255!

    'color'       , # The color of this label
    ] )
labels = [
    #       name                     id    trainId   color
    Label(  'background'            ,  0 ,        0 , (   0,   0,   0) ),
    Label(  'aeroplane'             ,  1 ,        1 , ( 128,   0,   0) ),
    Label(  'bicycle'               ,  2 ,        2 , (   0, 128,   0) ),
    Label(  'bird'                  ,  3 ,        3 , ( 128, 128,   0) ),
    Label(  'boat'                  ,  4 ,        4 , (   0,   0, 128) ),
    Label(  'bottle'                ,  5 ,        5 , ( 128,   0, 128) ),
    Label(  'bus'                   ,  6 ,        6 , (   0, 128, 128) ),
    Label(  'car'                   ,  7 ,        7 , ( 128, 128, 128) ),
    Label(  'cat'                   ,  8 ,        8 , (  64,   0,   0) ),
    Label(  'chair'                 ,  9 ,        9 , ( 192,   0,   0) ),
    Label(  'cow'                   , 10 ,       10 , (  64, 128,   0) ),
    Label(  'dining table'          , 11 ,       11 , ( 192, 128,   0) ),
    Label(  'dog'                   , 12 ,       12 , (  64,   0, 128) ),
    Label(  'horse'                 , 13 ,       13 , ( 192,   0, 128) ),
    Label(  'motorbike'             , 14 ,       14 , (  64, 128, 128) ),
    Label(  'person'                , 15 ,       15 , ( 192, 128, 128) ),
    Label(  'potted plant'          , 16 ,       16 , (   0,  64,   0) ),
    Label(  'sheep'                 , 17 ,       17 , ( 128,  64,   0) ),
    Label(  'sofa'                  , 18 ,       18 , (   0, 192,   0) ),
    Label(  'train'                 , 19 ,       19 , ( 128, 192,   0) ),
    Label(  'tv monitor'            , 20 ,       20 , (   0,  64, 128) ),
    Label(  'bordering region'      , 255,       21 , ( 224, 224, 192) ),
]


####################################################################################
num_classes = 22
unspecified_id = num_classes - 1
train_id = list()
valid_labels = dict()
color_palette = list()
id_key = list()
id_mapping = list()
for label in labels:
    train_id.append(label.trainId)
    valid_labels[label.name] = label.id
    color_palette += list(label.color)
    # encoder: r<<16 + g<<8 + b
    id_key.append(label.trainId)
    encoder = (label.color[0] << 16) + (label.color[1] << 8) + label.color[2]
    id_mapping.append(encoder)
assert list(train_id) == sorted(train_id) and len(train_id) == num_classes
assert len(color_palette) == (num_classes * 3)
temp = list(zip(id_mapping, id_key))
temp.sort()
temp = list(zip(*temp))
id_key = np.array(temp[1], dtype='int')
id_mapping = np.array(temp[0], dtype='int')
print('valid class: ', valid_labels)
print('train_id: ', train_id)
print('unspecified_id: ', unspecified_id)
print('color_palette: ', color_palette)
print('id_key: ', id_key)
print('id_mapping: ', id_mapping)
"""
valid class:  {'background': 0, 'aeroplane': 1, 'bicycle': 2, 'bird': 3, 'boat': 4, 'bottle': 5, 'bus': 6, 'car': 7, 'cat': 8, 'chair': 9, 'cow': 10, 'dining table': 11, 'dog': 12, 'horse': 13, 'motorbike': 14, 'person': 15, 'potted plant': 16, 'sheep': 17, 'sofa': 18, 'train': 19, 'tv monitor': 20, 'bordering region': 255}
train_id:  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]
unspecified_id:  21
color_palette:  [0, 0, 0, 128, 0, 0, 0, 128, 0, 128, 128, 0, 0, 0, 128, 128, 0, 128, 0, 128, 128, 128, 128, 128, 64, 0, 0, 192, 0, 0, 64, 128, 0, 192, 128, 0, 64, 0, 128, 192, 0, 128, 64, 128, 128, 192, 128, 128, 0, 64, 0, 128, 64, 0, 0, 192, 0, 128, 192, 0, 0, 64, 128, 224, 224, 192]
id_key:  [ 0  4 16 20  2  6 18  8 12 10 14  1  5 17  3  7 19  9 13 11 15 21]
id_mapping:  [       0      128    16384    16512    32768    32896    49152  4194304
  4194432  4227072  4227200  8388608  8388736  8404992  8421376  8421504
  8437760 12582912 12583040 12615680 12615808 14737600]
"""


####################################################################################
# Path of PASCAL VOC 2012 Dataset + Semantic Boundaries Dataset
"""
VOCdevkit
    ├─VOC2012
    |   ├─ImageSets
    |   ├─JPEGImages
    |   ├─SegmentationClass
    |   └─SemanticBoundaries
    |       ├─cls
    |       ├─img
    |       └─inst
    └─generate_aug_data.py

data_list_file:
img_name0
img_name1
img_name2
...
"""
data_dir = os.path.abspath(os.path.dirname(__file__))
img_dir = os.path.join(data_dir, 'VOC2012/JPEGImages')
mat_dir = os.path.join(data_dir, 'VOC2012/SemanticBoundaries/cls')
voc_img_sets_dir = os.path.join(data_dir, 'VOC2012/ImageSets/Segmentation')
sbd_img_sets_dir = os.path.join(data_dir, 'VOC2012/SemanticBoundaries')
voc_mask_dir = os.path.join(data_dir, 'VOC2012/SegmentationClass')
aug_mask_dir = os.path.join(data_dir, 'VOC2012/SegmentationClassAug')
if not os.path.exists(aug_mask_dir):
    os.mkdir(aug_mask_dir)


####################################################################################
# convert .mat to .png
print()
i = 0
for mat_file in os.listdir(mat_dir):
    match = re.match(r'^(\d+_\d+).mat$', mat_file)
    if match:
        img = match.groups()[0]
        mat = scipy.io.loadmat(os.path.join(mat_dir, mat_file), mat_dtype=True, squeeze_me=True, struct_as_record=False)
        assert np.max(mat['GTcls'].Segmentation) < unspecified_id # no bordering region
        mask = Image.fromarray(mat['GTcls'].Segmentation)
        mask.save(os.path.join(aug_mask_dir, img + '_trainIds.png'))
        mask.putpalette(color_palette)
        mask.save(os.path.join(aug_mask_dir, img + '.png'))
        i += 1
        print('\rConverting .mat to .png: %d' % i, end='')
        sys.stdout.flush()

# copy voc to aug
print()
i = 0
for mask_file in os.listdir(voc_mask_dir):
    match = re.match(r'^(\d+_\d+).png$', mask_file)
    if match:
        img = match.groups()[0]
        # copy voc to aug
        shutil.copyfile(os.path.join(voc_mask_dir, mask_file), os.path.join(aug_mask_dir, mask_file))
        mask = np.array(Image.open(os.path.join(aug_mask_dir, mask_file)).convert('RGB'), dtype=np.uint32)
        # encoder: r<<16 + g<<8 + b
        encoder = np.left_shift(mask[:, :, 0], 16) + np.left_shift(mask[:, :, 1], 8) + mask[:, :, 2]
        index = np.digitize(encoder.ravel(), id_mapping, right=True)
        new_mask = id_key[index].reshape(encoder.shape).astype('uint8')
        new_mask = Image.fromarray(new_mask)
        new_mask.save(os.path.join(aug_mask_dir, img + '_trainIds.png'))
        i += 1
        print('\rCopying voc to aug: %d' % i, end=' ')
        sys.stdout.flush()


####################################################################################
print()
with open(os.path.join(voc_img_sets_dir, 'train.txt')) as f:
    img_sets = f.readlines()
    voc_train = set([i.split()[0] for i in img_sets])
    assert len(img_sets) == len(voc_train)
with open(os.path.join(voc_img_sets_dir, 'val.txt')) as f:
    img_sets = f.readlines()
    voc_val = set([i.split()[0] for i in img_sets])
    assert len(img_sets) == len(voc_val)
with open(os.path.join(sbd_img_sets_dir, 'train.txt')) as f:
    img_sets = f.readlines()
    sbd_train = set([i.split()[0] for i in img_sets])
    assert len(img_sets) == len(sbd_train)
with open(os.path.join(sbd_img_sets_dir, 'val.txt')) as f:
    img_sets = f.readlines()
    sbd_val = set([i.split()[0] for i in img_sets])
    assert len(img_sets) == len(sbd_val)

aug_train = (sbd_train | sbd_val | voc_train) - voc_val
aug_trainval = aug_train | voc_val
# check
for item in aug_trainval:
    img = os.path.join(img_dir, item + '.jpg')
    mask = os.path.join(aug_mask_dir, item + '_trainIds.png')
    assert os.path.exists(img) and os.path.exists(mask)

# create data list
with open(os.path.join(data_dir, 'train_aug.txt'), 'w') as train:
    for line in aug_train:
        train.write(str(line) + '\n')
    print('Created train data list ({}) in {}.'.format(len(aug_train), data_dir))
with open(os.path.join(data_dir, 'trainval_aug.txt'), 'w') as trainval:
    for line in aug_trainval:
        trainval.write(str(line) + '\n')
    print('Created trainval data list ({}) in {}.'.format(len(aug_trainval), data_dir))


####################################################################################
# compute class weights
print()
class_count = np.zeros(num_classes, dtype='int64')
# Get the total number of pixels in all train masks for each class
for i, img in enumerate(aug_train, 1):
    mask = np.array(Image.open(os.path.join(aug_mask_dir, img + '_trainIds.png')))
    class_count += np.histogram(mask, bins=np.arange(num_classes + 1))[0]
    print('\rComputing class weight: %d' % i, end=' ')
    sys.stdout.flush()

# including unspecified_id
class_p_unspecified = class_count / np.sum(class_count.astype(np.int64))
class_weight_unspecified = 1 / np.log(1.02 + class_p_unspecified)
# excluding unspecified_id
class_p = class_count[:-1] / np.sum(class_count[:-1].astype(np.int64))
class_weight = 1 / np.log(1.02 + class_p)

def array2string(array, format='%.6f'):
    return ', '.join([format % i for i in array])

print()
with open(os.path.join(data_dir, 'args_aug.txt'), 'w') as f:
    # valid_labels
    f.writelines('valid class:\n')
    f.writelines('{}\n\n'.format(valid_labels))
    # unspecified_id
    f.writelines('unspecified_id: {}\n\n'.format(unspecified_id))
    # train_id
    f.writelines('train_id:\n')
    f.writelines(array2string(train_id, '%d') + '\n\n')
    # class_count
    f.writelines('pixel counts for each class:\n')
    f.writelines(array2string(class_count, '%d') + '\n\n')
    # class_p_unspecified
    f.writelines('class probability including unspecified_id:\n')
    f.writelines(array2string(class_p_unspecified) + '\n\n')
    # class_weight_unspecified
    f.writelines('class weight including unspecified_id:\n')
    f.writelines(array2string(class_weight_unspecified) + '\n\n')
    # class_p
    f.writelines('class probability excluding unspecified_id:\n')
    f.writelines(array2string(class_p) + '\n\n')
    # class_weight
    f.writelines('class weight excluding unspecified_id:\n')
    f.writelines(array2string(class_weight) + '\n\n')
print('Generated class weight in {}.'.format(os.path.join(data_dir, 'args.txt')))

PASCAL VOC 增强版语义分割数据集制作 (For PyTorch)

1. 数据集简介

2. 数据集的下载

3. 数据集的标签

4. 数据集生成

你可能感兴趣的:(PASCAL VOC 增强版语义分割数据集制作 (For PyTorch))