PASCAL VOC 语义分割数据集制作 (For PyTorch)

1. 数据集简介

PASCAL VOC 挑战赛主要有 Classification、Detection、Segmentation、Person Layout、Action Classification 这几类子任务，一共有20个类别，包括：

person
bird, cat, cow, dog, horse, sheep
aeroplane, bicycle, boat, bus, car, motorbike, train
bottle, chair, dining table, potted plant, sofa, tv/monitor

数据集的组成

文件目录结构：

Annotations：包含xml文件，其中有检测、分类等任务的标签
ImageSets：定义了训练集、验证集与测试集的划分
JPEGImages：原始图像
SegmentationClass：语义分割的标签 (RGB)
SegmentationObject：实例分割的标签 (RGB)

PS: 此处主要介绍语义分割部分。

2. 数据集的下载

PASCAL VOC 2012 数据集的下载地址：http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html
由于下载得到的语义分割标签为 RGB 图像，所以需要额外将其转换为灰度图像。

3. 数据集的标签

labels = [
    #           class name            id    trainId         color
    Label(  'background'            ,  0 ,        0 , (   0,   0,   0) ),
    Label(  'aeroplane'             ,  1 ,        1 , ( 128,   0,   0) ),
    Label(  'bicycle'               ,  2 ,        2 , (   0, 128,   0) ),
    Label(  'bird'                  ,  3 ,        3 , ( 128, 128,   0) ),
    Label(  'boat'                  ,  4 ,        4 , (   0,   0, 128) ),
    Label(  'bottle'                ,  5 ,        5 , ( 128,   0, 128) ),
    Label(  'bus'                   ,  6 ,        6 , (   0, 128, 128) ),
    Label(  'car'                   ,  7 ,        7 , ( 128, 128, 128) ),
    Label(  'cat'                   ,  8 ,        8 , (  64,   0,   0) ),
    Label(  'chair'                 ,  9 ,        9 , ( 192,   0,   0) ),
    Label(  'cow'                   , 10 ,       10 , (  64, 128,   0) ),
    Label(  'dining table'          , 11 ,       11 , ( 192, 128,   0) ),
    Label(  'dog'                   , 12 ,       12 , (  64,   0, 128) ),
    Label(  'horse'                 , 13 ,       13 , ( 192,   0, 128) ),
    Label(  'motorbike'             , 14 ,       14 , (  64, 128, 128) ),
    Label(  'person'                , 15 ,       15 , ( 192, 128, 128) ),
    Label(  'potted plant'          , 16 ,       16 , (   0,  64,   0) ),
    Label(  'sheep'                 , 17 ,       17 , ( 128,  64,   0) ),
    Label(  'sofa'                  , 18 ,       18 , (   0, 192,   0) ),
    Label(  'train'                 , 19 ,       19 , ( 128, 192,   0) ),
    Label(  'tv monitor'            , 20 ,       20 , (   0,  64, 128) ),
    Label(  'bordering region'      , 255,       21 , ( 224, 224, 192) ),
]

PS: PASCAL VOC 分割数据集中将物体的边界区域标记为 bordering region，表示这些区域可以是任何类别，包括 background，在计算精度时将忽略该部分像素。

分割标签

4. 数据集生成

所生成的data list文件格式为：
2007_000032
2007_000039
2007_000063
2007_000068
2007_000121
2007_000170
...
image在JPEGImages目录下，2007_000032.jpg
mask在SegmentationClass目录下，2007_000032_trainIds.png

import os
import sys
import shutil
import numpy as np
from PIL import Image
from collections import namedtuple


Label = namedtuple( 'Label' , [
    'name'        , # The identifier of this label, e.g. 'car', 'person', ... .
                    # We use them to uniquely name a class

    'id'          , # An integer ID that is associated with this label.
                    # The IDs are used to represent the label in ground truth images
                    # An ID of -1 means that this label does not have an ID and thus
                    # is ignored when creating ground truth images (e.g. license plate).
                    # Do not modify these IDs, since exactly these IDs are expected by the
                    # evaluation server.

    'trainId'     , # Feel free to modify these IDs as suitable for your method. Then create
                    # ground truth images with train IDs, using the tools provided in the
                    # 'preparation' folder. However, make sure to validate or submit results
                    # to our evaluation server using the regular IDs above!
                    # For trainIds, multiple labels might have the same ID. Then, these labels
                    # are mapped to the same class in the ground truth images. For the inverse
                    # mapping, we use the label that is defined first in the list below.
                    # For example, mapping all void-type classes to the same ID in training,
                    # might make sense for some approaches.
                    # Max value is 255!

    'color'       , # The color of this label
    ] )
labels = [
    #       name                     id    trainId   color
    Label(  'background'            ,  0 ,        0 , (   0,   0,   0) ),
    Label(  'aeroplane'             ,  1 ,        1 , ( 128,   0,   0) ),
    Label(  'bicycle'               ,  2 ,        2 , (   0, 128,   0) ),
    Label(  'bird'                  ,  3 ,        3 , ( 128, 128,   0) ),
    Label(  'boat'                  ,  4 ,        4 , (   0,   0, 128) ),
    Label(  'bottle'                ,  5 ,        5 , ( 128,   0, 128) ),
    Label(  'bus'                   ,  6 ,        6 , (   0, 128, 128) ),
    Label(  'car'                   ,  7 ,        7 , ( 128, 128, 128) ),
    Label(  'cat'                   ,  8 ,        8 , (  64,   0,   0) ),
    Label(  'chair'                 ,  9 ,        9 , ( 192,   0,   0) ),
    Label(  'cow'                   , 10 ,       10 , (  64, 128,   0) ),
    Label(  'dining table'          , 11 ,       11 , ( 192, 128,   0) ),
    Label(  'dog'                   , 12 ,       12 , (  64,   0, 128) ),
    Label(  'horse'                 , 13 ,       13 , ( 192,   0, 128) ),
    Label(  'motorbike'             , 14 ,       14 , (  64, 128, 128) ),
    Label(  'person'                , 15 ,       15 , ( 192, 128, 128) ),
    Label(  'potted plant'          , 16 ,       16 , (   0,  64,   0) ),
    Label(  'sheep'                 , 17 ,       17 , ( 128,  64,   0) ),
    Label(  'sofa'                  , 18 ,       18 , (   0, 192,   0) ),
    Label(  'train'                 , 19 ,       19 , ( 128, 192,   0) ),
    Label(  'tv monitor'            , 20 ,       20 , (   0,  64, 128) ),
    Label(  'bordering region'      , 255,       21 , ( 224, 224, 192) ),
]


####################################################################################
num_classes = 22
unspecified_id = num_classes - 1
train_id = list()
valid_labels = dict()
id_key = list()
id_mapping = list()
for label in labels:
    train_id.append(label.trainId)
    valid_labels[label.name] = label.id
    id_key.append(label.trainId)
    # encoder: r<<16 + g<<8 + b
    color = label.color
    encoder = (color[0] << 16) + (color[1] << 8) + color[2]
    id_mapping.append(encoder)
assert list(train_id) == sorted(train_id) and len(train_id) == num_classes
temp = list(zip(id_mapping, id_key))
temp.sort()
temp = list(zip(*temp))
id_key = np.array(temp[1], dtype='int')
id_mapping = np.array(temp[0], dtype='int')
print('valid class: ', valid_labels)
print('train_id: ', train_id)
print('unspecified_id: ', unspecified_id)
print('id_key: ', id_key)
print('id_mapping: ', id_mapping)
"""
valid class:  {'background': 0, 'aeroplane': 1, 'bicycle': 2, 'bird': 3, 'boat': 4, 'bottle': 5, 'bus': 6, 'car': 7, 'cat': 8, 'chair': 9, 'cow': 10, 'dining table': 11, 'dog': 12, 'horse': 13, 'motorbike': 14, 'person': 15, 'potted plant': 16, 'sheep': 17, 'sofa': 18, 'train': 19, 'tv monitor': 20, 'bordering region': 255}
train_id:  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]
unspecified_id:  21
id_key:  [ 0  4 16 20  2  6 18  8 12 10 14  1  5 17  3  7 19  9 13 11 15 21]
id_mapping:  [       0      128    16384    16512    32768    32896    49152  4194304
  4194432  4227072  4227200  8388608  8388736  8404992  8421376  8421504
  8437760 12582912 12583040 12615680 12615808 14737600]
"""


####################################################################################
# Path of PASCAL VOC 2012 Dataset
"""
VOCdevkit
    ├─VOC2012
    |   ├─ImageSets
    |   ├─JPEGImages
    |   └─SegmentationClass
    └─generate_voc_data.py

data_list_file:
img_name0
img_name1
img_name2
...
"""
data_dir = os.path.abspath(os.path.dirname(__file__))
img_dir = os.path.join(data_dir, 'VOC2012/JPEGImages')
mask_dir = os.path.join(data_dir, 'VOC2012/SegmentationClass')
img_sets_dir = os.path.join(data_dir, 'VOC2012/ImageSets/Segmentation')


####################################################################################
print()
for split in ('train', 'val', 'trainval', 'test'):
    with open(os.path.join(img_sets_dir, '%s.txt' % split), 'r') as f:
        img_sets = [i.split()[0] for i in f.readlines()]
    for i, img in enumerate(img_sets, 1):
        img_path = os.path.join(img_dir, img + '.jpg')
        assert os.path.exists(img_path), 'Cannot find the image: {}.'.format(img_path)
        if split != 'test':
            mask_path = os.path.join(mask_dir, img + '.png')
            assert os.path.exists(mask_path), 'Cannot find the mask: {}.'.format(mask_path)
            new_mask_path = os.path.join(mask_dir, img + '_trainIds.png')
            if not os.path.exists(new_mask_path):
                mask = np.array(Image.open(mask_path).convert('RGB'), dtype=np.uint32)
                # encoder: r<<16 + g<<8 + b
                encoder = np.left_shift(mask[:, :, 0], 16) + np.left_shift(mask[:, :, 1], 8) + mask[:, :, 2]
                index = np.digitize(encoder.ravel(), id_mapping, right=True)
                new_mask = id_key[index].reshape(encoder.shape).astype('uint8')
                new_mask = Image.fromarray(new_mask)
                new_mask.save(new_mask_path)
                print('\rGenerating masks_trainIds: %d' % i, end=' ')
                sys.stdout.flush()
    shutil.copy(os.path.join(img_sets_dir, '%s.txt' % split), os.path.join(data_dir, '%s.txt' % split))
    print('\n{}: Found {} images in the folder {}.'.format(split, len(img_sets), img_dir))
    print('{}: Found {} masks in the folder {}.'.format(split, len(img_sets), mask_dir))
    print('{}: Created data list in {}.'.format(split, data_dir))


####################################################################################
# compute class weights
print()
class_count = np.zeros(num_classes, dtype='int64')
# Get the total number of pixels in all train masks for each class
with open(os.path.join(data_dir, 'train.txt'), 'r') as f:
    img_sets = [i.split()[0] for i in f.readlines()]
for i, img in enumerate(img_sets, 1):
    mask = np.array(Image.open(os.path.join(mask_dir, img + '_trainIds.png')))
    class_count += np.histogram(mask, bins=np.arange(num_classes + 1))[0]
    print('\rComputing class weight: %d' % i, end=' ')
    sys.stdout.flush()

# including unspecified_id
class_p_unspecified = class_count / np.sum(class_count.astype(np.int64))
class_weight_unspecified = 1 / np.log(1.02 + class_p_unspecified)
# excluding unspecified_id
class_p = class_count[:-1] / np.sum(class_count[:-1].astype(np.int64))
class_weight = 1 / np.log(1.02 + class_p)

def array2string(array, format='%.6f'):
    return ', '.join([format % i for i in array])

print()
with open(os.path.join(data_dir, 'args.txt'), 'w') as f:
    # valid_labels
    f.writelines('valid class:\n')
    f.writelines('{}\n\n'.format(valid_labels))
    # unspecified_id
    f.writelines('unspecified_id: {}\n\n'.format(unspecified_id))
    # train_id
    f.writelines('train_id:\n')
    f.writelines(array2string(train_id, '%d') + '\n\n')
    # class_count
    f.writelines('pixel counts for each class:\n')
    f.writelines(array2string(class_count, '%d') + '\n\n')
    # class_p_unspecified
    f.writelines('class probability including unspecified_id:\n')
    f.writelines(array2string(class_p_unspecified) + '\n\n')
    # class_weight_unspecified
    f.writelines('class weight including unspecified_id:\n')
    f.writelines(array2string(class_weight_unspecified) + '\n\n')
    # class_p
    f.writelines('class probability excluding unspecified_id:\n')
    f.writelines(array2string(class_p) + '\n\n')
    # class_weight
    f.writelines('class weight excluding unspecified_id:\n')
    f.writelines(array2string(class_weight) + '\n\n')
print('Generated class weight in {}.'.format(os.path.join(data_dir, 'args.txt')))

PASCAL VOC 语义分割数据集制作 (For PyTorch)

1. 数据集简介

2. 数据集的下载

3. 数据集的标签

4. 数据集生成

你可能感兴趣的:(PASCAL VOC 语义分割数据集制作 (For PyTorch))