任务:
1.将数据进行voc格式的打包
2.将打包好的voc格式的数据集转化成tfrecord的文件
补充:我在(一)发布的models中,这个模型太老了,有些代码还处在Python2的编译环境中,而我使用的是Python3环境。所以,我在进行数据的训练过程中遇到了Python2和Python3代码不兼容的问题,然后我使用较新的models进行模型的训练。同样的将(一)的步骤放到新models也是可以的。下面我给出我已经配好的models百度网盘下载路径:链接: https://pan.baidu.com/s/1PCk3rkrB8c-YacpbefKCUQ 提取码: tssj
1.下在widerface数据集,我的百度网盘地址:链接: https://pan.baidu.com/s/1AqRdjUpOA0-QLLCwNTe82Q 提取码: 9w7s
2.一共有4个压缩包分别是训练数据集,测试数据集和验证数据集,以及人脸的标注信息,解压后是下面这样
3.在我是在数据集下的相同目录下创建5个文件夹:
JPEGImages:用来保存你的数据图片
Annotations:这里是存放你对所有数据图片做的标注,每张照片的标注信息必须是xml格式
ImageSets/Main:train.txt、val.txt
TF-record:是将voc格式的数据转化成tfrecord文件的
fit-model:是用来存放训练模型以及tensorboard日志数据的
3.对数据进行voc格式的转化
需要注意改的地方就是下面的路径了
rootdir对应的是你创建那5个文件的上一级目录
gtfile是标注信息的txt文件,我下面的是验证集的标注信息
im_folder是你验证集的图片路径
fwrite是存放了你打包好的数据同样你要先创建ImageSets/Main,上面我提过
注意的一点是我的#1代码wider_face_val_bbx_gt.txt,WIDER_val/images,ImageSets/Main/val.txt都是验证集的数据打包,而实际操作还需要测试集的(我给出了#2代码),当然你需要根据自己的文件路径进行修改,修改可以参照将val改成train
运行:我是在research的目录下 使用python3 object_detection/to_voc.py
进行代码的分别编译train和val
#1
rootdir = "/home/hyb/models/dataset/widerface"
gtfile = "/home/hyb/models/dataset/widerface/wider_face_split/wider_face_val_bbx_gt.txt"
im_folder = "/home/hyb/models/dataset/widerface/WIDER_val/images"
fwrite = open("/home/hyb/models/dataset/widerface/ImageSets/Main/val.txt", "w")
#2
rootdir = "/home/hyb/models/dataset/widerface"
gtfile = "/home/hyb/models/dataset/widerface/wider_face_split/wider_face_train_bbx_gt.txt"
im_folder = "/home/hyb/models/dataset/widerface/WIDER_val/trains"
fwrite = open("/home/hyb/models/dataset/widerface/ImageSets/Main/train.txt", "w")
import os, cv2, sys, shutil, numpy
from xml.dom.minidom import Document
import os
# 本程序可以讲widerface转为VOC格式的数据
def writexml(filename, saveimg, bboxes, xmlpath):
doc = Document()
annotation = doc.createElement('annotation')
doc.appendChild(annotation)
folder = doc.createElement('folder')
folder_name = doc.createTextNode('widerface')
folder.appendChild(folder_name)
annotation.appendChild(folder)
filenamenode = doc.createElement('filename')
filename_name = doc.createTextNode(filename)
filenamenode.appendChild(filename_name)
annotation.appendChild(filenamenode)
source = doc.createElement('source')
annotation.appendChild(source)
database = doc.createElement('database')
database.appendChild(doc.createTextNode('wider face Database'))
source.appendChild(database)
annotation_s = doc.createElement('annotation')
annotation_s.appendChild(doc.createTextNode('PASCAL VOC2007'))
source.appendChild(annotation_s)
image = doc.createElement('image')
image.appendChild(doc.createTextNode('flickr'))
source.appendChild(image)
flickrid = doc.createElement('flickrid')
flickrid.appendChild(doc.createTextNode('-1'))
source.appendChild(flickrid)
owner = doc.createElement('owner')
annotation.appendChild(owner)
flickrid_o = doc.createElement('flickrid')
flickrid_o.appendChild(doc.createTextNode('muke'))
owner.appendChild(flickrid_o)
name_o = doc.createElement('name')
name_o.appendChild(doc.createTextNode('muke'))
owner.appendChild(name_o)
size = doc.createElement('size')
annotation.appendChild(size)
width = doc.createElement('width')
width.appendChild(doc.createTextNode(str(saveimg.shape[1])))
height = doc.createElement('height')
height.appendChild(doc.createTextNode(str(saveimg.shape[0])))
depth = doc.createElement('depth')
depth.appendChild(doc.createTextNode(str(saveimg.shape[2])))
size.appendChild(width)
size.appendChild(height)
size.appendChild(depth)
segmented = doc.createElement('segmented')
segmented.appendChild(doc.createTextNode('0'))
annotation.appendChild(segmented)
for i in range(len(bboxes)):
bbox = bboxes[i]
objects = doc.createElement('object')
annotation.appendChild(objects)
object_name = doc.createElement('name')
object_name.appendChild(doc.createTextNode('face'))
objects.appendChild(object_name)
pose = doc.createElement('pose')
pose.appendChild(doc.createTextNode('Unspecified'))
objects.appendChild(pose)
truncated = doc.createElement('truncated')
truncated.appendChild(doc.createTextNode('0'))
objects.appendChild(truncated)
difficult = doc.createElement('difficult')
difficult.appendChild(doc.createTextNode('0'))
objects.appendChild(difficult)
bndbox = doc.createElement('bndbox')
objects.appendChild(bndbox)
xmin = doc.createElement('xmin')
xmin.appendChild(doc.createTextNode(str(bbox[0])))
bndbox.appendChild(xmin)
ymin = doc.createElement('ymin')
ymin.appendChild(doc.createTextNode(str(bbox[1])))
bndbox.appendChild(ymin)
xmax = doc.createElement('xmax')
xmax.appendChild(doc.createTextNode(str(bbox[0] + bbox[2])))
bndbox.appendChild(xmax)
ymax = doc.createElement('ymax')
ymax.appendChild(doc.createTextNode(str(bbox[1] + bbox[3])))
bndbox.appendChild(ymax)
f = open(xmlpath, "w")
f.write(doc.toprettyxml(indent=''))
f.close()
rootdir = "/home/hyb/models/dataset/widerface"
gtfile = "/home/hyb/models/dataset/widerface/wider_face_split/wider_face_val_bbx_gt.txt"
im_folder = "/home/hyb/models/dataset/widerface/WIDER_val/images"
fwrite = open("/home/hyb/models/dataset/widerface/ImageSets/Main/val.txt", "w")
# wider_face_train_bbx_gt.txt的文件内容
# 第一行为名字
# 第二行为头像的数量 n
# 剩下的为n行人脸数据
# 以下为示例
# 0--Parade/0_Parade_marchingband_1_117.jpg
# 9
# 69 359 50 36 1 0 0 0 0 1
# 227 382 56 43 1 0 1 0 0 1
# 296 305 44 26 1 0 0 0 0 1
# 353 280 40 36 2 0 0 0 2 1
# 885 377 63 41 1 0 0 0 0 1
# 819 391 34 43 2 0 0 0 1 0
# 727 342 37 31 2 0 0 0 0 1
# 598 246 33 29 2 0 0 0 0 1
# 740 308 45 33 1 0 0 0 2 1
with open(gtfile, "r") as gt:
while (True):
gt_con = gt.readline()[:-1]
if gt_con is None or gt_con == "":
break;
im_path = im_folder + "/" + gt_con;
print(im_path)
im_data = cv2.imread(im_path)
if im_data is None:
continue
# 可视化的部分
# cv2.imshow(im_path, im_data)
# cv2.waitKey(0)
numbox = int(gt.readline())
# 获取每一行人脸数据
bboxes = []
if numbox == 0: # numbox 为0 的情况处理
gt.readline()
else:
for i in range(numbox):
line = gt.readline()
infos = line.split(" ") # 用空格分割
# x y w h .....
bbox = (int(infos[0]), int(infos[1]), int(infos[2]), int(infos[3]))
# 绘制人脸框
# cv2.rectangle(im_data, (int(infos[0]), int(infos[1])),
# (int(infos[0]) + int(infos[2]), int(infos[1]) + int(infos[3])),
# color=(0, 0, 255), thickness=1)
bboxes.append(bbox) # 将一张图片的所有人脸数据加入bboxes
# cv2.imshow(im_path, im_data)
# cv2.waitKey(0)
filename = gt_con.replace("/", "_") # 将存储位置作为图片名称,斜杠转为下划线
fwrite.write(filename.split(".")[0] + "\n")
cv2.imwrite("{}/JPEGImages/{}".format(rootdir, filename), im_data)
xmlpath = "{}/Annotations/{}.xml".format(rootdir, filename.split(".")[0])
writexml(filename, im_data, bboxes, xmlpath)
fwrite.close()
4.将打包好的数据转化成tfrecord文件
在我的models中下:/home/hyb/muke/models/research/object_detection/dataset_tools/create_face_tf_record.py这个文件就能对数据进行tfrecord的数据打包
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
r"""Convert raw PASCAL dataset to TFRecord for object_detection.
Example usage:
python object_detection/dataset_tools/create_pascal_tf_record.py \
--data_dir=/home/user/VOCdevkit \
--year=VOC2012 \
--output_path=/home/user/pascal.record
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import hashlib
import io
import logging
import os
from lxml import etree
import PIL.Image
import tensorflow as tf
from object_detection.utils import dataset_util
from object_detection.utils import label_map_util
flags = tf.app.flags
flags.DEFINE_string('data_dir', '', 'Root directory to raw PASCAL VOC dataset.')
flags.DEFINE_string('set', 'train', 'Convert training set, validation set or '
'merged set.')
flags.DEFINE_string('annotations_dir', 'Annotations',
'(Relative) path to annotations directory.')
flags.DEFINE_string('year', 'VOC2007', 'Desired challenge year.')
flags.DEFINE_string('output_path', '', 'Path to output TFRecord')
flags.DEFINE_string('label_map_path', 'object_detection/data/face_label_map.pbtxtop',
'Path to label map proto')
flags.DEFINE_boolean('ignore_difficult_instances', False, 'Whether to ignore '
'difficult instances')
FLAGS = flags.FLAGS
SETS = ['train', 'val', 'trainval', 'test']
YEARS = ["fddb", 'widerface']
def dict_to_tf_example(data,
dataset_directory,
label_map_dict,
ignore_difficult_instances=False,
image_subdirectory='JPEGImages'):
"""Convert XML derived dict to tf.Example proto.
Notice that this function normalizes the bounding box coordinates provided
by the raw data.
Args:
data: dict holding PASCAL XML fields for a single image (obtained by
running dataset_util.recursive_parse_xml_to_dict)
dataset_directory: Path to root directory holding PASCAL dataset
label_map_dict: A map from string label names to integers ids.
ignore_difficult_instances: Whether to skip difficult instances in the
dataset (default: False).
image_subdirectory: String specifying subdirectory within the
PASCAL dataset directory holding the actual image data.
Returns:
example: The converted tf.Example.
Raises:
ValueError: if the image pointed to by data['filename'] is not a valid JPEG
"""
img_path = os.path.join(data['folder'], image_subdirectory, data['filename'])
full_path = os.path.join(dataset_directory, img_path)
with tf.gfile.GFile(full_path, 'rb') as fid:
encoded_jpg = fid.read()
encoded_jpg_io = io.BytesIO(encoded_jpg)
image = PIL.Image.open(encoded_jpg_io)
if image.format != 'JPEG':
raise ValueError('Image format not JPEG')
key = hashlib.sha256(encoded_jpg).hexdigest()
width = int(data['size']['width'])
height = int(data['size']['height'])
xmin = []
ymin = []
xmax = []
ymax = []
classes = []
classes_text = []
truncated = []
poses = []
difficult_obj = []
if 'object' in data:
for obj in data['object']:
difficult = bool(int(obj['difficult']))
if ignore_difficult_instances and difficult:
continue
difficult_obj.append(int(difficult))
xmin.append(float(obj['bndbox']['xmin']) / width)
ymin.append(float(obj['bndbox']['ymin']) / height)
xmax.append(float(obj['bndbox']['xmax']) / width)
ymax.append(float(obj['bndbox']['ymax']) / height)
classes_text.append(obj['name'].encode('utf8'))
classes.append(label_map_dict[obj['name']])
truncated.append(int(obj['truncated']))
poses.append(obj['pose'].encode('utf8'))
example = tf.train.Example(features=tf.train.Features(feature={
'image/height': dataset_util.int64_feature(height),
'image/width': dataset_util.int64_feature(width),
'image/filename': dataset_util.bytes_feature(
data['filename'].encode('utf8')),
'image/source_id': dataset_util.bytes_feature(
data['filename'].encode('utf8')),
'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')),
'image/encoded': dataset_util.bytes_feature(encoded_jpg),
'image/format': dataset_util.bytes_feature('jpeg'.encode('utf8')),
'image/object/bbox/xmin': dataset_util.float_list_feature(xmin),
'image/object/bbox/xmax': dataset_util.float_list_feature(xmax),
'image/object/bbox/ymin': dataset_util.float_list_feature(ymin),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymax),
'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),
'image/object/difficult': dataset_util.int64_list_feature(difficult_obj),
'image/object/truncated': dataset_util.int64_list_feature(truncated),
'image/object/view': dataset_util.bytes_list_feature(poses),
}))
return example
def main(_):
if FLAGS.set not in SETS:
raise ValueError('set must be in : {}'.format(SETS))
if FLAGS.year not in YEARS:
raise ValueError('year must be in : {}'.format(YEARS))
data_dir = FLAGS.data_dir
years = ["fddb", 'widerface']
if FLAGS.year != 'merged':
years = [FLAGS.year]
writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
label_map_dict = label_map_util.get_label_map_dict(FLAGS.label_map_path)
for year in years:
logging.info('Reading from PASCAL %s dataset.', year)
examples_path = os.path.join(data_dir, year, 'ImageSets', 'Main',
FLAGS.set + '.txt')
annotations_dir = os.path.join(data_dir, year, FLAGS.annotations_dir)
examples_list = dataset_util.read_examples_list(examples_path)
for idx, example in enumerate(examples_list):
if idx % 100 == 0:
logging.info('On image %d of %d', idx, len(examples_list))
path = os.path.join(annotations_dir, example + '.xml')
with tf.gfile.GFile(path, 'r') as fid:
xml_str = fid.read()
xml = etree.fromstring(xml_str)
data = dataset_util.recursive_parse_xml_to_dict(xml)['annotation']
tf_example = dict_to_tf_example(data, FLAGS.data_dir, label_map_dict,
FLAGS.ignore_difficult_instances)
writer.write(tf_example.SerializeToString())
writer.close()
if __name__ == '__main__':
tf.app.run()
而你需要对这个脚本进行相应的修改:
1.YEARS = [“fddb”, ‘widerface’],如果你拿到的是github下的models,YEARS = [“这里的内容是不一样的”]
2.years = [“fddb”, ‘widerface’] 同样代码靠后位置还有一个year需要你修改
3. examples_path = os.path.join(data_dir, year, ‘ImageSets’, ‘Main’,FLAGS.set + ‘.txt’)我这里‘Main’后面是没有加东西的直接是我给出的这样一个代码,如果是github上的话会有个“airXXX”的前缀,你把他删除掉,当然,我的代码里已经删除了,为什么呢?你看我在创建ImageSets/Main这个文件夹是,我后直接带的是train.txt和val.txt,没有在添加多其他目录
4. 运行的时候,我依旧在research目录下的终端上运行,使用的是下面这个代码
5. 训练集打包:注意的是–data_dir是你ImageSets/Main这5个文件夹的上一级目录 --output_path是你打包好存放的路径
以及flags.DEFINE_string(‘label_map_path’, ‘object_detection/data/face_label_map.pbtxtop’,
‘Path to label map proto’)将标签指向face_label_map.pbtxtop
python3 object_detection/dataset_tools/create_face_tf_record.py
--data_dir=/home/hyb/muke/models/dataset/widerface
--year=widerface
--output_path=/home/hyb/muke/models/dataset/widerface/TF-record/train.record
--set=train`
验证数据集的打包
python3 object_detection/dataset_tools/create_face_tf_record.py
--data_dir=/home/hyb/muke/models/dataset/widerface
--year=widerface
--output_path=/home/hyb/muke/models/dataset/widerface/TF-record/val.record
--set=val