1. 数据集简介
PASCAL VOC 增强版语义分割数据集包括 PASCAL VOC 2012 数据集和 Semantic Boundaries Dataset 两部分。SBD 数据集包含来自 PASCAL VOC 2011 数据集的11355张图片的注释,标签文件为 .mat 格式,类别与 PASCAL VOC 一致:
- person
- bird, cat, cow, dog, horse, sheep
- aeroplane, bicycle, boat, bus, car, motorbike, train
- bottle, chair, dining table, potted plant, sofa, tv/monitor
PASCAL VOC 2012 数据集文件目录结构:
- Annotations:包含xml文件,其中有检测、分类等任务的标签
- ImageSets:定义了训练集、验证集与测试集的划分
- JPEGImages:原始图像
- SegmentationClass:语义分割的标签 (RGB)
- SegmentationObject:实例分割的标签 (RGB)
Semantic Boundaries Dataset 文件目录结构:
- img:原始图像
- cls:语义分割的标签 (.mat)
- inst:实例分割的标签 (.mat)
- train.txt:包含 8498 个用于训练的图像索引
- val.txt:包含 2857 个用于验证的图像索引
PS: 此处主要介绍语义分割部分。
2. 数据集的下载
PASCAL VOC 2012 数据集的下载地址:http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html
下载得到的语义分割标签为 RGB 图像,需要额外将其转换为灰度图像。Semantic Boundaries Dataset的下载地址:
http://home.bharathh.info/pubs/codes/SBD/download.html
下载得到的分割标签为 .mat 格式,需要将其转换为与 PASCAL VOC 格式相同的灰度图像。此外由于其所需的原始图像均包含在 PASCAL VOC 2012 中,所以仅需要其标签部分。
数据集下载完成后解压如下:
"""
VOCdevkit
├─VOC2012
| ├─Annotations
| ├─ImageSets
| ├─JPEGImages
| ├─SegmentationClass
| ├─SegmentationObject
| └─SemanticBoundaries
| ├─cls
| ├─inst
| ├─train.txt
| └─val.txt
└─generate_aug_data.py
"""
3. 数据集的标签
labels = [
# class name id trainId color
Label( 'background' , 0 , 0 , ( 0, 0, 0) ),
Label( 'aeroplane' , 1 , 1 , ( 128, 0, 0) ),
Label( 'bicycle' , 2 , 2 , ( 0, 128, 0) ),
Label( 'bird' , 3 , 3 , ( 128, 128, 0) ),
Label( 'boat' , 4 , 4 , ( 0, 0, 128) ),
Label( 'bottle' , 5 , 5 , ( 128, 0, 128) ),
Label( 'bus' , 6 , 6 , ( 0, 128, 128) ),
Label( 'car' , 7 , 7 , ( 128, 128, 128) ),
Label( 'cat' , 8 , 8 , ( 64, 0, 0) ),
Label( 'chair' , 9 , 9 , ( 192, 0, 0) ),
Label( 'cow' , 10 , 10 , ( 64, 128, 0) ),
Label( 'dining table' , 11 , 11 , ( 192, 128, 0) ),
Label( 'dog' , 12 , 12 , ( 64, 0, 128) ),
Label( 'horse' , 13 , 13 , ( 192, 0, 128) ),
Label( 'motorbike' , 14 , 14 , ( 64, 128, 128) ),
Label( 'person' , 15 , 15 , ( 192, 128, 128) ),
Label( 'potted plant' , 16 , 16 , ( 0, 64, 0) ),
Label( 'sheep' , 17 , 17 , ( 128, 64, 0) ),
Label( 'sofa' , 18 , 18 , ( 0, 192, 0) ),
Label( 'train' , 19 , 19 , ( 128, 192, 0) ),
Label( 'tv monitor' , 20 , 20 , ( 0, 64, 128) ),
Label( 'bordering region' , 255, 21 , ( 224, 224, 192) ),
]
PS: PASCAL VOC 分割数据集中将物体的边界区域标记为 bordering region,表示这些区域可以是任何类别,在计算精度时将忽略该部分像素。Semantic Boundaries Dataset 中不含 bordering region 部分。
4. 数据集生成
PS: 训练集、验证集和测试集的划分参照 deeplab,即:
train = (sbd_train | sbd_val | voc_train) - voc_val,验证集与测试集同 PASCAL VOC 2012
所生成的data list文件格式为:
2007_000032
2007_000039
2007_000063
2007_000068
2007_000121
2007_000170
...
image在JPEGImages目录下,2007_000032.jpg
mask在SegmentationClassAug目录下,2007_000032_trainIds.png
import os
import sys
import re
import shutil
import numpy as np
from PIL import Image
import scipy.io
from collections import namedtuple
Label = namedtuple( 'Label' , [
'name' , # The identifier of this label, e.g. 'car', 'person', ... .
# We use them to uniquely name a class
'id' , # An integer ID that is associated with this label.
# The IDs are used to represent the label in ground truth images
# An ID of -1 means that this label does not have an ID and thus
# is ignored when creating ground truth images (e.g. license plate).
# Do not modify these IDs, since exactly these IDs are expected by the
# evaluation server.
'trainId' , # Feel free to modify these IDs as suitable for your method. Then create
# ground truth images with train IDs, using the tools provided in the
# 'preparation' folder. However, make sure to validate or submit results
# to our evaluation server using the regular IDs above!
# For trainIds, multiple labels might have the same ID. Then, these labels
# are mapped to the same class in the ground truth images. For the inverse
# mapping, we use the label that is defined first in the list below.
# For example, mapping all void-type classes to the same ID in training,
# might make sense for some approaches.
# Max value is 255!
'color' , # The color of this label
] )
labels = [
# name id trainId color
Label( 'background' , 0 , 0 , ( 0, 0, 0) ),
Label( 'aeroplane' , 1 , 1 , ( 128, 0, 0) ),
Label( 'bicycle' , 2 , 2 , ( 0, 128, 0) ),
Label( 'bird' , 3 , 3 , ( 128, 128, 0) ),
Label( 'boat' , 4 , 4 , ( 0, 0, 128) ),
Label( 'bottle' , 5 , 5 , ( 128, 0, 128) ),
Label( 'bus' , 6 , 6 , ( 0, 128, 128) ),
Label( 'car' , 7 , 7 , ( 128, 128, 128) ),
Label( 'cat' , 8 , 8 , ( 64, 0, 0) ),
Label( 'chair' , 9 , 9 , ( 192, 0, 0) ),
Label( 'cow' , 10 , 10 , ( 64, 128, 0) ),
Label( 'dining table' , 11 , 11 , ( 192, 128, 0) ),
Label( 'dog' , 12 , 12 , ( 64, 0, 128) ),
Label( 'horse' , 13 , 13 , ( 192, 0, 128) ),
Label( 'motorbike' , 14 , 14 , ( 64, 128, 128) ),
Label( 'person' , 15 , 15 , ( 192, 128, 128) ),
Label( 'potted plant' , 16 , 16 , ( 0, 64, 0) ),
Label( 'sheep' , 17 , 17 , ( 128, 64, 0) ),
Label( 'sofa' , 18 , 18 , ( 0, 192, 0) ),
Label( 'train' , 19 , 19 , ( 128, 192, 0) ),
Label( 'tv monitor' , 20 , 20 , ( 0, 64, 128) ),
Label( 'bordering region' , 255, 21 , ( 224, 224, 192) ),
]
####################################################################################
num_classes = 22
unspecified_id = num_classes - 1
train_id = list()
valid_labels = dict()
color_palette = list()
id_key = list()
id_mapping = list()
for label in labels:
train_id.append(label.trainId)
valid_labels[label.name] = label.id
color_palette += list(label.color)
# encoder: r<<16 + g<<8 + b
id_key.append(label.trainId)
encoder = (label.color[0] << 16) + (label.color[1] << 8) + label.color[2]
id_mapping.append(encoder)
assert list(train_id) == sorted(train_id) and len(train_id) == num_classes
assert len(color_palette) == (num_classes * 3)
temp = list(zip(id_mapping, id_key))
temp.sort()
temp = list(zip(*temp))
id_key = np.array(temp[1], dtype='int')
id_mapping = np.array(temp[0], dtype='int')
print('valid class: ', valid_labels)
print('train_id: ', train_id)
print('unspecified_id: ', unspecified_id)
print('color_palette: ', color_palette)
print('id_key: ', id_key)
print('id_mapping: ', id_mapping)
"""
valid class: {'background': 0, 'aeroplane': 1, 'bicycle': 2, 'bird': 3, 'boat': 4, 'bottle': 5, 'bus': 6, 'car': 7, 'cat': 8, 'chair': 9, 'cow': 10, 'dining table': 11, 'dog': 12, 'horse': 13, 'motorbike': 14, 'person': 15, 'potted plant': 16, 'sheep': 17, 'sofa': 18, 'train': 19, 'tv monitor': 20, 'bordering region': 255}
train_id: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]
unspecified_id: 21
color_palette: [0, 0, 0, 128, 0, 0, 0, 128, 0, 128, 128, 0, 0, 0, 128, 128, 0, 128, 0, 128, 128, 128, 128, 128, 64, 0, 0, 192, 0, 0, 64, 128, 0, 192, 128, 0, 64, 0, 128, 192, 0, 128, 64, 128, 128, 192, 128, 128, 0, 64, 0, 128, 64, 0, 0, 192, 0, 128, 192, 0, 0, 64, 128, 224, 224, 192]
id_key: [ 0 4 16 20 2 6 18 8 12 10 14 1 5 17 3 7 19 9 13 11 15 21]
id_mapping: [ 0 128 16384 16512 32768 32896 49152 4194304
4194432 4227072 4227200 8388608 8388736 8404992 8421376 8421504
8437760 12582912 12583040 12615680 12615808 14737600]
"""
####################################################################################
# Path of PASCAL VOC 2012 Dataset + Semantic Boundaries Dataset
"""
VOCdevkit
├─VOC2012
| ├─ImageSets
| ├─JPEGImages
| ├─SegmentationClass
| └─SemanticBoundaries
| ├─cls
| ├─img
| └─inst
└─generate_aug_data.py
data_list_file:
img_name0
img_name1
img_name2
...
"""
data_dir = os.path.abspath(os.path.dirname(__file__))
img_dir = os.path.join(data_dir, 'VOC2012/JPEGImages')
mat_dir = os.path.join(data_dir, 'VOC2012/SemanticBoundaries/cls')
voc_img_sets_dir = os.path.join(data_dir, 'VOC2012/ImageSets/Segmentation')
sbd_img_sets_dir = os.path.join(data_dir, 'VOC2012/SemanticBoundaries')
voc_mask_dir = os.path.join(data_dir, 'VOC2012/SegmentationClass')
aug_mask_dir = os.path.join(data_dir, 'VOC2012/SegmentationClassAug')
if not os.path.exists(aug_mask_dir):
os.mkdir(aug_mask_dir)
####################################################################################
# convert .mat to .png
print()
i = 0
for mat_file in os.listdir(mat_dir):
match = re.match(r'^(\d+_\d+).mat$', mat_file)
if match:
img = match.groups()[0]
mat = scipy.io.loadmat(os.path.join(mat_dir, mat_file), mat_dtype=True, squeeze_me=True, struct_as_record=False)
assert np.max(mat['GTcls'].Segmentation) < unspecified_id # no bordering region
mask = Image.fromarray(mat['GTcls'].Segmentation)
mask.save(os.path.join(aug_mask_dir, img + '_trainIds.png'))
mask.putpalette(color_palette)
mask.save(os.path.join(aug_mask_dir, img + '.png'))
i += 1
print('\rConverting .mat to .png: %d' % i, end='')
sys.stdout.flush()
# copy voc to aug
print()
i = 0
for mask_file in os.listdir(voc_mask_dir):
match = re.match(r'^(\d+_\d+).png$', mask_file)
if match:
img = match.groups()[0]
# copy voc to aug
shutil.copyfile(os.path.join(voc_mask_dir, mask_file), os.path.join(aug_mask_dir, mask_file))
mask = np.array(Image.open(os.path.join(aug_mask_dir, mask_file)).convert('RGB'), dtype=np.uint32)
# encoder: r<<16 + g<<8 + b
encoder = np.left_shift(mask[:, :, 0], 16) + np.left_shift(mask[:, :, 1], 8) + mask[:, :, 2]
index = np.digitize(encoder.ravel(), id_mapping, right=True)
new_mask = id_key[index].reshape(encoder.shape).astype('uint8')
new_mask = Image.fromarray(new_mask)
new_mask.save(os.path.join(aug_mask_dir, img + '_trainIds.png'))
i += 1
print('\rCopying voc to aug: %d' % i, end=' ')
sys.stdout.flush()
####################################################################################
print()
with open(os.path.join(voc_img_sets_dir, 'train.txt')) as f:
img_sets = f.readlines()
voc_train = set([i.split()[0] for i in img_sets])
assert len(img_sets) == len(voc_train)
with open(os.path.join(voc_img_sets_dir, 'val.txt')) as f:
img_sets = f.readlines()
voc_val = set([i.split()[0] for i in img_sets])
assert len(img_sets) == len(voc_val)
with open(os.path.join(sbd_img_sets_dir, 'train.txt')) as f:
img_sets = f.readlines()
sbd_train = set([i.split()[0] for i in img_sets])
assert len(img_sets) == len(sbd_train)
with open(os.path.join(sbd_img_sets_dir, 'val.txt')) as f:
img_sets = f.readlines()
sbd_val = set([i.split()[0] for i in img_sets])
assert len(img_sets) == len(sbd_val)
aug_train = (sbd_train | sbd_val | voc_train) - voc_val
aug_trainval = aug_train | voc_val
# check
for item in aug_trainval:
img = os.path.join(img_dir, item + '.jpg')
mask = os.path.join(aug_mask_dir, item + '_trainIds.png')
assert os.path.exists(img) and os.path.exists(mask)
# create data list
with open(os.path.join(data_dir, 'train_aug.txt'), 'w') as train:
for line in aug_train:
train.write(str(line) + '\n')
print('Created train data list ({}) in {}.'.format(len(aug_train), data_dir))
with open(os.path.join(data_dir, 'trainval_aug.txt'), 'w') as trainval:
for line in aug_trainval:
trainval.write(str(line) + '\n')
print('Created trainval data list ({}) in {}.'.format(len(aug_trainval), data_dir))
####################################################################################
# compute class weights
print()
class_count = np.zeros(num_classes, dtype='int64')
# Get the total number of pixels in all train masks for each class
for i, img in enumerate(aug_train, 1):
mask = np.array(Image.open(os.path.join(aug_mask_dir, img + '_trainIds.png')))
class_count += np.histogram(mask, bins=np.arange(num_classes + 1))[0]
print('\rComputing class weight: %d' % i, end=' ')
sys.stdout.flush()
# including unspecified_id
class_p_unspecified = class_count / np.sum(class_count.astype(np.int64))
class_weight_unspecified = 1 / np.log(1.02 + class_p_unspecified)
# excluding unspecified_id
class_p = class_count[:-1] / np.sum(class_count[:-1].astype(np.int64))
class_weight = 1 / np.log(1.02 + class_p)
def array2string(array, format='%.6f'):
return ', '.join([format % i for i in array])
print()
with open(os.path.join(data_dir, 'args_aug.txt'), 'w') as f:
# valid_labels
f.writelines('valid class:\n')
f.writelines('{}\n\n'.format(valid_labels))
# unspecified_id
f.writelines('unspecified_id: {}\n\n'.format(unspecified_id))
# train_id
f.writelines('train_id:\n')
f.writelines(array2string(train_id, '%d') + '\n\n')
# class_count
f.writelines('pixel counts for each class:\n')
f.writelines(array2string(class_count, '%d') + '\n\n')
# class_p_unspecified
f.writelines('class probability including unspecified_id:\n')
f.writelines(array2string(class_p_unspecified) + '\n\n')
# class_weight_unspecified
f.writelines('class weight including unspecified_id:\n')
f.writelines(array2string(class_weight_unspecified) + '\n\n')
# class_p
f.writelines('class probability excluding unspecified_id:\n')
f.writelines(array2string(class_p) + '\n\n')
# class_weight
f.writelines('class weight excluding unspecified_id:\n')
f.writelines(array2string(class_weight) + '\n\n')
print('Generated class weight in {}.'.format(os.path.join(data_dir, 'args.txt')))