TFRecord
生成【 PASCAL VOC2012
】训练数据。
该数据集是常用的语义分割数据集,总共有20类分割目标。在我之前的博文中,详细介绍了该数据集的详细内容,请参考博文:PASCAL VOC 2012 数据集解析。
TFRecord
格式是一种用于存储二进制记录序列的简单格式。协议缓冲区是一个跨平台、跨语言的库,用于高效地序列化结构化数据。协议消息由 .proto 文件定义,这通常是了解消息类型最简单的方法。tf.Example
消息(或 protobuf
)是一种灵活的消息类型,表示 {"string": value}
映射。它专为 TensorFlow 而设计,并被用于 TFX 等高级 API。
tf.Example
消息的数据类型
tf.train.BytesList
:可以使用的类型包括 string
和 byte
.
tf.train.FloatList
::可以使用的类型包括 float
和double
.
tf.train.Int64List
:可以使用的类型包括 enum
,bool
, int32
, uint32
, int64
,uint64
.
类型转换函数如下:
# The following functions can be used to convert a value to a type compatible
# with tf.Example.
def _bytes_feature(value):
"""Returns a bytes_list from a string / byte."""
if isinstance(value, type(tf.constant(0))):
value = value.numpy() # BytesList won't unpack a string from an EagerTensor.
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def _float_feature(value):
"""Returns a float_list from a float / double."""
return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))
def _int64_feature(value):
"""Returns an int64_list from a bool / enum / int / uint."""
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
创建tf.Example
消息
在实践中,数据集可能来自任何地方,但是从单个观测值创建 tf.Example
消息的过程相同:(a)在每个观测结果中,需要使用上述其中一种函数,将每个值转换为包含三种兼容类型之一的 tf.train.Feature
;(b)创建一个从特征名称字符串到第 1 步中生成的编码特征值的映射(字典);(c)将第 2 步中生成的映射转换为 Features
消息。
比如,使用 NumPy 创建一个数据集,此数据集将具有 4 个特征:
(a)具有相等 False 或 True 概率的布尔特征;
(b)从 [0, 5] 均匀随机选择的整数特征;
(c)通过将整数特征作为索引从字符串表生成的字符串特征;
(d)来自标准正态分布的浮点特征;
创建消息的实例如下代码如下,比如创建10000个样本
# The number of observations in the dataset.
n_observations = int(10000)
# Boolean feature, encoded as False or True.
feature0 = np.random.choice([False, True], n_observations)
# Integer feature, random from 0 to 4.
feature1 = np.random.randint(0, 5, n_observations)
# String feature
strings = np.array([b'cat', b'dog', b'chicken', b'horse', b'goat'])
feature2 = strings[feature1]
# Float feature, from a standard normal distribution
feature3 = np.random.randn(n_observations)
可以使用 _bytes_feature
、_float_feature
或 _int64_feature
将下面的每个特征强制转换为兼容 tf.Example
的类型。然后,可以通过下面的已编码特征创建 tf.Example 消息:
def serialize_example(feature0, feature1, feature2, feature3):
"""
Creates a tf.Example message ready to be written to a file.
"""
# Create a dictionary mapping the feature name to the tf.Example-compatible
# data type.
feature = {
'feature0': _int64_feature(feature0),
'feature1': _int64_feature(feature1),
'feature2': _bytes_feature(feature2),
'feature3': _float_feature(feature3),
}
# Create a Features message using tf.train.Example.
example_proto = tf.train.Example(features=tf.train.Features(feature=feature))
return example_proto.SerializeToString()
例如,假设我们从数据集中获得了一个观测值 [False, 4, bytes('goat'), 0.9876]
。可以使用 serialize_example()
创建和打印此观测值的 tf.Example
消息。如上所述,每个观测值将被写为一条 Features
消息。请注意,tf.Example
消息只是 Features
消息外围的包装器:
# This is an example observation from the dataset.
example_observation = []
serialized_example = serialize_example(False, 4, b'goat', 0.9876)
print(serialized_example)
打印结果如下:
b'\nR\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04[\xd3|?\n\x14\n\x08feature2\x12\x08\n\x06\n\x04goat\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x00\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x04'
要解码消息,使用 tf.train.Example.FromString
方法,代码如下:
example_proto = tf.train.Example.FromString(serialized_example)
print(example_proto)
打印结果如下:
features {
feature {
key: "feature0"
value {
int64_list {
value: 0
}
}
}
feature {
key: "feature1"
value {
int64_list {
value: 4
}
}
}
feature {
key: "feature2"
value {
bytes_list {
value: "goat"
}
}
}
feature {
key: "feature3"
value {
float_list {
value: 0.9876000285148621
}
}
}
}
上述内容的完整代码如下所示,
import tensorflow as tf
import numpy as np
# The following functions can be used to convert a value to a type compatible
# with tf.Example.
def _bytes_feature(value):
"""Returns a bytes_list from a string / byte."""
if isinstance(value, type(tf.constant(0))):
value=value.numpy() # BytesList won't unpack a string from an EagerTensor.
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def _float_feature(value):
"""Returns a float_list from a float / double."""
return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))
def _int64_feature(value):
"""Returns an int64_list from a bool / enum / int / uint."""
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
# The number of observations in the dataset.
n_observations=int(1e4)
# Boolean feature, encoded as False or True.
feature0=np.random.choice([False, True], n_observations)
# Integer feature, random from 0 to 4.
feature1=np.random.randint(0, 5, n_observations)
# String feature
strings=np.array([b'cat', b'dog', b'chicken', b'horse', b'goat'])
feature2=strings[feature1]
# Float feature, from a standard normal distribution
feature3=np.random.randn(n_observations)
def serialize_example(feature0, feature1, feature2, feature3):
"""
Creates a tf.Example message ready to be written to a file.
"""
# Create a dictionary mapping the feature name to the tf.Example-compatible
# data type.
feature={
'feature0': _int64_feature(feature0),
'feature1': _int64_feature(feature1),
'feature2': _bytes_feature(feature2),
'feature3': _float_feature(feature3),
}
# Create a Features message using tf.train.Example.
example_proto=tf.train.Example(features=tf.train.Features(feature=feature))
return example_proto.SerializeToString()
# This is an example observation from the dataset.
example_observation = []
serialized_example = serialize_example(False, 4, b'goat', 0.9876)
print(serialized_example)
example_proto = tf.train.Example.FromString(serialized_example)
print(example_proto)
此次我们将官方的PASCAL VOC 2012
分割数据集制作为TFRecords,以便用于训练语义分割网络(DeepLab-V3+)。官方数据下载地址:http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar. 下面,我们将逐步介绍官方数据集以及如何制作【TFRecords】。
PASCAL VOC 2012文件结构
+VOCtrainval_11-May-2012
+ VOCdevkit
+ VOC2012
+ Annotations
+ ImageSets
+Action
+Layout
+Main
+Segmentation
+ JPEGImages
+ SegmentationClass
+ SegmentationObject
Annotations:存放的XML文件(17125个文件),与JPEGImage文件夹的图片一一对应,主要记录了图片的基本信息,比如文件路径,来源,标注信息等,打开其中文件如下图所示,
<annotation>
<folder>VOC2012</folder> # 图片的存放路径
<filename>2007_000027.jpg</filename> # 图片名称
<source> # 图片来源相关信息
<database>The VOC2007 Database</database>
<annotation>PASCAL VOC2007</annotation>
<image>flickr</image>
</source>
<size> # 图像尺寸
<width>486</width>
<height>500</height>
<depth>3</depth>
</size>
<segmented>0</segmented> # 是否有语义分割标注,0:表示没有,1:表示有
<object> # 检测的目标,如果有多个会有多个
<name>person</name> # 目标类别,人
<pose>Unspecified</pose> # 拍摄角度
<truncated>0</truncated> # 是否被截断, 0表示完整
<difficult>0</difficult> # 目标是否难以识别, 0表示容易识别
<bndbox> # bounding-box, 包含左上角和右下角xy坐标
<xmin>174</xmin>
<ymin>101</ymin>
<xmax>349</xmax>
<ymax>351</ymax>
</bndbox>
<part>
<name>head</name> # 头
<bndbox>
<xmin>169</xmin>
<ymin>104</ymin>
<xmax>209</xmax>
<ymax>146</ymax>
</bndbox>
</part>
<part>
<name>hand</name> # 手
<bndbox>
<xmin>278</xmin>
<ymin>210</ymin>
<xmax>297</xmax>
<ymax>233</ymax>
</bndbox>
</part>
<part>
<name>foot</name> # 脚
<bndbox>
<xmin>273</xmin>
<ymin>333</ymin>
<xmax>297</xmax>
<ymax>354</ymax>
</bndbox>
</part>
<part>
<name>foot</name> # 脚
<bndbox>
<xmin>319</xmin>
<ymin>307</ymin>
<xmax>340</xmax>
<ymax>326</ymax>
</bndbox>
</part>
</object>
</annotation>
ImageSets:将各个标注任务的文件名列表和相应的标注信息存放于(txt)文本。
ImageSets/子目录 | 子目录内容简述 |
---|---|
Action | 存放的是人的动作标注信息,10种动作类包括【 jumping、phoning、playinginstrument、reading、ridingbike、ridinghorse、running、takingphoto、usingcomputer、walking】; |
Layout | 存放的是具有人体部位的数据(人的head、hand、feet等); |
Main | 存放的是图像物体识别的数据标签文本,总共20类,包含了20个分类的,class_train.txt(存放训练集的图片编号,每一个class的train数据都有5717个)、 class_val.txt(存放验证集的图片编号,每一个class的val数据都有5832个) 和 class_trainval.txt(存放以上两者的完全合并集,每一个class有5717+5832=11540个) |
Segmentation | 存放语义分割数据集的文件名列表,存放与三个文本(train.txt,1464个条数据,trainval.txt,2913条数据,val.txt,1449条数据)。 |
JPEGImages:存放所有的原始图片,格式为JPG,总共17125个文件。
SegmentationClass:用于语义分割的图片,2913个文件(训练集+验证集)。
SegmentationObject:用于目标检测的图片,2913个文件(训练集+验证集)。
移除标签颜色图
我们知道,官方的标签数据是单通道的彩色图,在制作TFRecords之前,需要移除。
关于颜色图的简单解释:该图的存储为【MxN,index matrix】和【Cx3,colormap matrix】的两个矩阵,索引值其实是在取颜色。下面的Matlab代码可以运行看下具体效果,
[img, cmap] = imread('2007_000033.png'); if isempty(cmap) % Process data as a grayscale or RGB image disp('Process data as a grayscale or RGB image'); else % Process data as an indexed image disp('Process data as an indexed image'); end imshow(img, cmap); imwrite(img, cmap, 'outfile.png');
我们选取 PASCAL VOC 22012的一张标签图为例【2007_000033.png】,用上述的Matlab代码运行,得到颜色矩阵和图像的索引矩阵:
制作TFRecords
在DeepLab的开源代码中,我们需要先移除标签的颜色,tensorflow代码如下:
"""Removes the color map from segmentation annotations.
Removes the color map from the ground truth segmentation annotations and save
the results to output_dir.
"""
import glob
import os.path
import numpy as np
from PIL import Image
import tensorflow as tf
FLAGS = tf.compat.v1.flags.FLAGS
tf.compat.v1.flags.DEFINE_string('original_gt_folder',
'./VOCdevkit/VOC2012/SegmentationClass',
'Original ground truth annotations.')
tf.compat.v1.flags.DEFINE_string('segmentation_format', 'png', 'Segmentation format.')
tf.compat.v1.flags.DEFINE_string('output_dir',
'./VOCdevkit/VOC2012/SegmentationClassNoColor',
'folder to save modified ground truth annotations.')
def _remove_colormap(filename):
"""Removes the color map from the annotation.
Args:
filename: Ground truth annotation filename.
Returns:
Annotation without color map.
"""
return np.array(Image.open(filename))
def _save_annotation(annotation, filename):
"""Saves the annotation as png file.
Args:
annotation: Segmentation annotation.
filename: Output filename.
"""
pil_image = Image.fromarray(annotation.astype(dtype=np.uint8))
with tf.io.gfile.GFile(filename, mode='w') as f:
pil_image.save(f, 'PNG')
def main(unused_argv):
# Create the output directory if not exists.
if not tf.io.gfile.isdir(FLAGS.output_dir):
tf.io.gfile.makedirs(FLAGS.output_dir)
annotations = glob.glob(os.path.join(FLAGS.original_gt_folder, '*.' + FLAGS.segmentation_format))
for annotation in annotations:
raw_annotation = _remove_colormap(annotation)
filename = os.path.basename(annotation)[:-4]
_save_annotation(raw_annotation,
os.path.join(
FLAGS.output_dir,
filename + '.' + FLAGS.segmentation_format))
if __name__ == '__main__':
tf.compat.v1.app.run()
运行上述代码,指定移除颜色的新标签保存路径(SegmentationClassNoColor),左图是原图标签图,右图是移除颜色的结果,图片内的红色框分别是原始标签的路径以及移除结果保存的路径,具体见下图展示:
参考tensorflow
官方开源的DeepLab代码(build_data.py
,build_voc2012_data.py
),代码的流程与第二部分(二、什么是TFRecords?)的流程一致。简单来说,首先,定义好【Example Proto】域的属性;其次,转换每一个属性的格式,使其与【tf.Example】兼容;最后,将每一个【Example】序列化,写入tesorflow
定义的文件中。以下是build_voc2012_data.py
代码,
"""Converts PASCAL VOC 2012 data to TFRecord file format with Example protos.
PASCAL VOC 2012 dataset is expected to have the following directory structure:
+ pascal_voc_seg
- build_data.py
- build_voc2012_data.py (current working directory).
+ VOCdevkit
+ VOC2012
+ JPEGImages
+ SegmentationClass
+ ImageSets
+ Segmentation
+ tfrecord
Image folder:
./VOCdevkit/VOC2012/JPEGImages
Semantic segmentation annotations:
./VOCdevkit/VOC2012/SegmentationClass
list folder:
./VOCdevkit/VOC2012/ImageSets/Segmentation
This script converts data into sharded data files and save at tfrecord folder.
The Example proto contains the following fields:
image/encoded: encoded image content.
image/filename: image filename.
image/format: image file format.
image/height: image height.
image/width: image width.
image/channels: image channels.
image/segmentation/class/encoded: encoded semantic segmentation content.
image/segmentation/class/format: semantic segmentation file format.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
import os.path
import sys
import build_data
from six.moves import range
import tensorflow as tf
FLAGS = tf.app.flags.FLAGS
tf.app.flags.DEFINE_string('image_folder',
'./pascal_voc_seg/VOCdevkit/VOC2012/JPEGImages',
'Folder containing images.')
tf.app.flags.DEFINE_string('semantic_segmentation_folder',
'./pascal_voc_seg/VOCdevkit/VOC2012/SegmentationClassAug/SegmentationClassAug',
'Folder containing semantic segmentation annotations.')
tf.app.flags.DEFINE_string('list_folder',
'./pascal_voc_seg/VOCdevkit/VOC2012/ImageSets/Segmentation',
'Folder containing lists for training and validation')
tf.app.flags.DEFINE_string('output_dir',
'./pascal_voc_seg/tfrecord',
'Path to save converted SSTable of TensorFlow examples.')
_NUM_SHARDS = 4
def _convert_dataset(dataset_split):
"""Converts the specified dataset split to TFRecord format.
Args:
dataset_split: The dataset split (e.g., train, test).
Raises:
RuntimeError: If loaded image and label have different shape.
"""
dataset = os.path.basename(dataset_split)[:-4]
sys.stdout.write('Processing ' + dataset)
filenames = [x.strip('\n') for x in open(dataset_split, 'r')]
num_images = len(filenames)
print(num_images)
num_per_shard = int(math.ceil(num_images / _NUM_SHARDS))
image_reader = build_data.ImageReader('jpeg', channels=3)
label_reader = build_data.ImageReader('png', channels=1)
for shard_id in range(_NUM_SHARDS):
output_filename = os.path.join(FLAGS.output_dir, '%s-%05d-of-%05d.tfrecord' % (dataset, shard_id, _NUM_SHARDS))
with tf.io.TFRecordWriter(output_filename) as tfrecord_writer:
start_idx = shard_id * num_per_shard
end_idx = min((shard_id + 1) * num_per_shard, num_images)
for i in range(start_idx, end_idx):
sys.stdout.write('\r>> Converting image %d/%d shard %d' % (i + 1, len(filenames), shard_id))
sys.stdout.flush()
# Read the image.
image_filename = os.path.join(FLAGS.image_folder, filenames[i] + '.' + FLAGS.image_format)
print('image_format: ', FLAGS.image_format)
print('image_filename: ', image_filename)
image_data = tf.io.gfile.GFile(image_filename, 'rb').read()
height, width = image_reader.read_image_dims(image_data)
# Read the semantic segmentation annotation.
seg_filename = os.path.join(FLAGS.semantic_segmentation_folder, filenames[i] + '.' + FLAGS.label_format)
seg_data = tf.io.gfile.GFile(seg_filename, 'rb').read()
seg_height, seg_width = label_reader.read_image_dims(seg_data)
if height != seg_height or width != seg_width:
raise RuntimeError('Shape mismatched between image and label.')
# Convert to tf example.
example = build_data.image_seg_to_tfexample(image_data, filenames[i], height, width, seg_data)
tfrecord_writer.write(example.SerializeToString())
sys.stdout.write('\n')
sys.stdout.flush()
def main(unused_argv):
dataset_splits = tf.gfile.Glob(os.path.join(FLAGS.list_folder, '*.txt'))
print(dataset_splits)
for dataset_split in dataset_splits:
_convert_dataset(dataset_split)
if __name__ == '__main__':
tf.compat.v1.app.run()
以下为build_data.py
代码,
"""Contains common utility functions and classes for building dataset.
This script contains utility functions and classes to converts dataset to
TFRecord file format with Example protos.
The Example proto contains the following fields:
image/encoded: encoded image content.
image/filename: image filename.
image/format: image file format.
image/height: image height.
image/width: image width.
image/channels: image channels.
image/segmentation/class/encoded: encoded semantic segmentation content.
image/segmentation/class/format: semantic segmentation file format.
"""
import collections
import six
import tensorflow as tf
FLAGS = tf.app.flags.FLAGS
tf.app.flags.DEFINE_enum('image_format', 'jpg', ['jpg', 'jpeg', 'png'], 'Image format.')
tf.app.flags.DEFINE_enum('label_format', 'png', ['png'], 'Segmentation label format.')
# A map from image format to expected data format.
_IMAGE_FORMAT_MAP = {
'jpg': 'jpeg',
'jpeg': 'jpeg',
'png': 'png',
}
class ImageReader(object):
"""Helper class that provides TensorFlow image coding utilities."""
def __init__(self, image_format='jpeg', channels=3):
"""Class constructor.
Args:
image_format: Image format. Only 'jpeg', 'jpg', or 'png' are supported.
channels: Image channels.
"""
with tf.Graph().as_default():
self._decode_data = tf.placeholder(dtype=tf.string)
self._image_format = image_format
self._session = tf.Session()
if self._image_format in ('jpeg', 'jpg'):
self._decode = tf.image.decode_jpeg(self._decode_data,
channels=channels)
elif self._image_format == 'png':
self._decode = tf.image.decode_png(self._decode_data,
channels=channels)
def read_image_dims(self, image_data):
"""Reads the image dimensions.
Args:
image_data: string of image data.
Returns:
image_height and image_width.
"""
image = self.decode_image(image_data)
return image.shape[:2]
def decode_image(self, image_data):
"""Decodes the image data string.
Args:
image_data: string of image data.
Returns:
Decoded image data.
Raises:
ValueError: Value of image channels not supported.
"""
image = self._session.run(self._decode,
feed_dict={self._decode_data: image_data})
if len(image.shape) != 3 or image.shape[2] not in (1, 3):
raise ValueError('The image channels not supported.')
return image
def _int64_list_feature(values):
"""Returns a TF-Feature of int64_list.
Args:
values: A scalar or list of values.
Returns:
A TF-Feature.
"""
if not isinstance(values, collections.Iterable):
values = [values]
return tf.train.Feature(int64_list=tf.train.Int64List(value=values))
def _bytes_list_feature(values):
"""Returns a TF-Feature of bytes.
Args:
values: A string.
Returns:
A TF-Feature.
"""
def norm2bytes(value):
return value.encode() if isinstance(value, str) and six.PY3 else value
return tf.train.Feature(
bytes_list=tf.train.BytesList(value=[norm2bytes(values)]))
def image_seg_to_tfexample(image_data, filename, height, width, seg_data):
"""Converts one image/segmentation pair to tf example.
Args:
image_data: string of image data.
filename: image filename.
height: image height.
width: image width.
seg_data: string of semantic segmentation data.
Returns:
tf example of one image/segmentation pair.
"""
return tf.train.Example(features=tf.train.Features(feature={
'image/encoded': _bytes_list_feature(image_data),
'image/filename': _bytes_list_feature(filename),
'image/format': _bytes_list_feature(_IMAGE_FORMAT_MAP[FLAGS.image_format]),
'image/height': _int64_list_feature(height),
'image/width': _int64_list_feature(width),
'image/channels': _int64_list_feature(3),
'image/segmentation/class/encoded': (_bytes_list_feature(seg_data)),
'image/segmentation/class/format': _bytes_list_feature(FLAGS.label_format),
}))
运行上述代码,注意路径的设置,得到如下TFRecords文件: