在上一篇我们已经成功的安装了tensorflow与tensorflow 物体检测的API。下面我们将实现使用自定义数据集训练自己的对象检测模型。完成此类工作大概需要以下6个步骤
1. 组织工程文件
2. 组织训练数据集与标注文件
3. 转化训练集为tf_record格式
4. 配置训练流程 pipeline
5. 监控模型训练过程
6. 保存模型参数
workspace 文件夹在object-detect文件夹下与存储tensorflow object detect api 的目录同级,目录结构如下所示
object-detect
├── models
│ ├── AUTHORS
│ ├── CODEOWNERS
│ ├── community
│ ├── CONTRIBUTING.md
│ ├── ISSUES.md
│ ├── LICENSE
│ ├── official
│ ├── orbit
│ ├── README.md
│ └── research
└── workspace
└── training_demo
该training_demo文件夹将是我们的训练文件夹,其中将包含与我们的模型训练有关的所有文件。每次我们希望在不同的数据集上进行训练时,建议创建一个单独的训练文件夹。培训文件夹的典型结构如下所示。
training_demo/
├─ addon
├─ annotations/
├─ exported-models/
├─ images/
│ ├─ test/
│ └─ train/
├─ models/
└─ pre-trained-models/
对于每个文件夹的作用说明如下
1. annotations:此文件夹将用于存储所有*.csv文件和各自的TensorFlow*.record文件,其中包含我们的数据集图像的标注列表。
2. exported-models:此文件夹将用于存储我最终模型
3. images:此文件夹包含我们数据集中所有图像的副本,以及*.xml为每个图像生成的相应文件
4. models:此文件夹将包含每个训练工作的子文件夹。每个子文件夹将包含训练流水线配置文件*.config,以及在训练和评估模型期间生成的所有文件。
5. pre-trained-models:此文件夹将包含下载的预训练模型,这些模型将用作我们训练工作的初始检查点。
6. addon 附加工具
在后面的程序编写过程中我们会对上面文件夹的描述有更为深刻的认识
建议安装windows版本,从下面结果中安装windows最新版,无需安装,但放置的目录不能有中文。下载链接
http://tzutalin.github.io/labelImg/
首先需要将标注完的图片上传至
training_demo/
├─ images/
完成对图像数据集的注释后,通常的惯例是仅将其中一部分用于训练,而其余部分用于评估。通常,比率为9:1,即90%的图像用于训练,其余的10%用于测试,但是您可以选择适合您需要的比率。分割的方式有两种,一种是手动分割,一种是程序分割,分割的比例是按照文件的数量而不是是实际标注的数量
import os
import re
from shutil import copyfile
import math
import random
def iterate_dir(source, dest, ratio, copy_xml):
source = source.replace('\\', '/')
dest = dest.replace('\\', '/')
train_dir = os.path.join(dest, 'train')
test_dir = os.path.join(dest, 'test')
if not os.path.exists(train_dir):
os.makedirs(train_dir)
if not os.path.exists(test_dir):
os.makedirs(test_dir)
images = [f for f in os.listdir(source)
if re.search(r'([a-zA-Z0-9\s_\\.\-\(\):])+(.jpg|.jpeg|.png)$', f)]
num_images = len(images)
num_test_images = math.ceil(ratio*num_images)
for i in range(num_test_images):
idx = random.randint(0, len(images)-1)
filename = images[idx]
copyfile(os.path.join(source, filename),
os.path.join(test_dir, filename))
if copy_xml:
xml_filename = os.path.splitext(filename)[0]+'.xml'
copyfile(os.path.join(source, xml_filename),
os.path.join(test_dir,xml_filename))
images.remove(images[idx])
for filename in images:
copyfile(os.path.join(source, filename),
os.path.join(train_dir, filename))
if copy_xml:
xml_filename = os.path.splitext(filename)[0]+'.xml'
copyfile(os.path.join(source, xml_filename),
os.path.join(train_dir, xml_filename))
iterate_dir("workspace/training_demo/images","workspace/training_demo/images",0.1,True)
TensorFlow需要一个标签映射,即将每个使用的标签映射到一个整数值。训练和检测过程都使用此标签映射。内容格式如下
item {
id: 1
name: 'y'
}
标签图文件具有扩展名.pbtxt,应放置在training_demo/annotations文件夹内,名称为label_map.pbtxt
可以通过下面脚本来实现该功能
import os
import glob
import pandas as pd
import io
import xml.etree.ElementTree as ET
import tensorflow.compat.v1 as tf
from PIL import Image
from object_detection.utils import dataset_util, label_map_util
from collections import namedtuple
labels_path="workspace/training_demo/annotations/label_map.pbtxt"
label_map = label_map_util.load_labelmap(labels_path)
label_map_dict = label_map_util.get_label_map_dict(label_map)
def xml_to_csv(path):
"""Iterates through all .xml files (generated by labelImg) in a given directory and combines
them in a single Pandas dataframe.
Parameters:
----------
path : str
The path containing the .xml files
Returns
-------
Pandas DataFrame
The produced dataframe
"""
xml_list = []
for xml_file in glob.glob(path + '/*.xml'):
tree = ET.parse(xml_file)
root = tree.getroot()
for member in root.findall('object'):
value = (root.find('filename').text,
int(root.find('size')[0].text),
int(root.find('size')[1].text),
member[0].text,
int(member[4][0].text),
int(member[4][1].text),
int(member[4][2].text),
int(member[4][3].text)
)
xml_list.append(value)
column_name = ['filename', 'width', 'height',
'class', 'xmin', 'ymin', 'xmax', 'ymax']
xml_df = pd.DataFrame(xml_list, columns=column_name)
# 由于第一个任务是detect所以,只取一类图像
xml_df = xml_df[xml_df['class']=='y']
return xml_df
def split(df, group):
data = namedtuple('data', ['filename', 'object'])
gb = df.groupby(group)
return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]
def class_text_to_int(row_label):
return label_map_dict[row_label]
def create_tf_example(group, path):
with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
encoded_jpg = fid.read()
encoded_jpg_io = io.BytesIO(encoded_jpg)
image = Image.open(encoded_jpg_io)
width, height = image.size
filename = group.filename.encode('utf8')
image_format = b'jpg'
xmins = []
xmaxs = []
ymins = []
ymaxs = []
classes_text = []
classes = []
for index, row in group.object.iterrows():
xmins.append(row['xmin'] / width)
xmaxs.append(row['xmax'] / width)
ymins.append(row['ymin'] / height)
ymaxs.append(row['ymax'] / height)
classes_text.append(row['class'].encode('utf8'))
classes.append(class_text_to_int(row['class']))
tf_example = tf.train.Example(features=tf.train.Features(feature={
'image/height': dataset_util.int64_feature(height),
'image/width': dataset_util.int64_feature(width),
'image/filename': dataset_util.bytes_feature(filename),
'image/source_id': dataset_util.bytes_feature(filename),
'image/encoded': dataset_util.bytes_feature(encoded_jpg),
'image/format': dataset_util.bytes_feature(image_format),
'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),
}))
return tf_example
output_path="workspace/training_demo/annotations/train.record"
image_dir="workspace/training_demo/images/train"
xml_dir="workspace/training_demo/images/train"
def create_tfrecord(output_path,image_dir,xmldir):
writer = tf.python_io.TFRecordWriter(output_path)
path = os.path.join(image_dir)
examples = xml_to_csv(xml_dir)
grouped = split(examples, 'filename')
for group in grouped:
tf_example = create_tf_example(group, path)
writer.write(tf_example.SerializeToString())
writer.close()
print('Successfully created the TFRecord file: {}'.format(output_path))
create_tfrecord(output_path,image_dir,xml_dir)
对验证集进行同样处理
output_path="workspace/training_demo/annotations/test.record"
image_dir="workspace/training_demo/images/test"
xml_dir="workspace/training_demo/images/test"
create_tfrecord(output_path,image_dir,xml_dir)
本次模型训练并不是从头开始,而是从一个已经训练好的模型中进行优化。所以在这之前需要选择预训练模型。我们可以在 TensorFlow 2 Detection Model Zoo(https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md) 中选择自己想要的模型。模型选择需要考虑的因素有以下几点
确定预训练模型之后通过下载其训练好的参数文件 .tar.gz 文件,解压并保存到training_demo/pre-trained-models,这里我们选择 SSD ResNet50 V1 FPN 640x640 下载链接为http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz 此时相关的目录结构如下
training_demo/
├─ ...
├─ pre-trained-models/
│ └─ ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/
│ ├─ checkpoint/
│ ├─ saved_model/
│ └─ pipeline.config
└─ ...
当然我们也可以下载多个模型,然后分别评估哪个效果更好。比如我们同样可以下载EfficientDet D1 640x640模型,此时相关的目录结构如下
training_demo/
├─ ...
├─ pre-trained-models/
│ ├─ efficientdet_d1_coco17_tpu-32/
│ │ ├─ checkpoint/
│ │ ├─ saved_model/
│ │ └─ pipeline.config
│ └─ ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/
│ ├─ checkpoint/
│ ├─ saved_model/
│ └─ pipeline.config
└─ ...
在下载好预训练模型之后,我们需要为本次训练创建一个目录。在“training_demo/models创建新目录”下my_ssd_resnet50_v1_fpn ,将training_demo/pre-trained-models/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/pipeline.config 文件复制到新创建的目录中。training_demo/models的目录结构如下所示:
training_demo/
├─ ...
├─ models/
│ └─ my_ssd_resnet50_v1_fpn/
│ └─ pipeline.config
└─ ...
该配置文件的内容如下:
model {
ssd {
**num_classes: 1 # Set this to the number of different label classes**
image_resizer {
fixed_shape_resizer {
height: 640
width: 640
}
}
feature_extractor {
type: "ssd_resnet50_v1_fpn_keras"
depth_multiplier: 1.0
min_depth: 16
conv_hyperparams {
regularizer {
l2_regularizer {
weight: 0.00039999998989515007
}
}
initializer {
truncated_normal_initializer {
mean: 0.0
stddev: 0.029999999329447746
}
}
activation: RELU_6
batch_norm {
decay: 0.996999979019165
scale: true
epsilon: 0.0010000000474974513
}
}
override_base_feature_extractor_hyperparams: true
fpn {
min_level: 3
max_level: 7
}
}
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
use_matmul_gather: true
}
}
similarity_calculator {
iou_similarity {
}
}
box_predictor {
weight_shared_convolutional_box_predictor {
conv_hyperparams {
regularizer {
l2_regularizer {
weight: 0.00039999998989515007
}
}
initializer {
random_normal_initializer {
mean: 0.0
stddev: 0.009999999776482582
}
}
activation: RELU_6
batch_norm {
decay: 0.996999979019165
scale: true
epsilon: 0.0010000000474974513
}
}
depth: 256
num_layers_before_predictor: 4
kernel_size: 3
class_prediction_bias_init: -4.599999904632568
}
}
anchor_generator {
multiscale_anchor_generator {
min_level: 3
max_level: 7
anchor_scale: 4.0
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
scales_per_octave: 2
}
}
post_processing {
batch_non_max_suppression {
score_threshold: 9.99999993922529e-09
iou_threshold: 0.6000000238418579
max_detections_per_class: 100
max_total_detections: 100
use_static_shapes: false
}
score_converter: SIGMOID
}
normalize_loss_by_num_matches: true
loss {
localization_loss {
weighted_smooth_l1 {
}
}
classification_loss {
weighted_sigmoid_focal {
gamma: 2.0
alpha: 0.25
}
}
classification_weight: 1.0
localization_weight: 1.0
}
encode_background_as_zeros: true
normalize_loc_loss_by_codesize: true
inplace_batchnorm_update: true
freeze_batchnorm: false
}
}
train_config {
**batch_size: 8 # Increase/Decrease this value depending on the available memory (Higher values require more memory and vice-versa)**
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
random_crop_image {
min_object_covered: 0.0
min_aspect_ratio: 0.75
max_aspect_ratio: 3.0
min_area: 0.75
max_area: 1.0
overlap_thresh: 0.0
}
}
sync_replicas: true
optimizer {
momentum_optimizer {
learning_rate {
cosine_decay_learning_rate {
learning_rate_base: 0.03999999910593033
total_steps: 25000
warmup_learning_rate: 0.013333000242710114
warmup_steps: 2000
}
}
momentum_optimizer_value: 0.8999999761581421
}
use_moving_average: false
}
**fine_tune_checkpoint: "pre-trained-models/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/checkpoint/ckpt-0" # Path to checkpoint of pre-trained model**
num_steps: 25000
startup_delay_steps: 0.0
replicas_to_aggregate: 8
max_number_of_boxes: 100
unpad_groundtruth_tensors: false
**fine_tune_checkpoint_type: "detection" # Set this to "detection" since we want to be training the full detection model**
**use_bfloat16: false # Set this to false if you are not training on a TPU**
fine_tune_checkpoint_version: V2
}
train_input_reader {
**label_map_path: "annotations/label_map.pbtxt" # Path to label map file**
tf_record_input_reader {
**input_path: "annotations/train.record" # Path to training TFRecord file**
}
}
eval_config {
metrics_set: "coco_detection_metrics"
use_moving_averages: false
}
eval_input_reader {
**label_map_path: "annotations/label_map.pbtxt" # Path to label map file**
shuffle: false
num_epochs: 1
tf_record_input_reader {
**input_path: "annotations/test.record" # Path to testing TFRecord**
}
}
注意标星的部分是需要取修改的
可以根据实际情况做如下调整
num_classes: 1
batch_size: 20
fine_tune_checkpoint: "pre-trained-models/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/checkpoint/ckpt-0" # Path to checkpoint of pre-trained model
fine_tune_checkpoint_type: "detection"
use_bfloat16: false
label_map_path: "annotations/label_map.pbtxt"
input_path: "annotations/train.record"
input_path: "annotations/test.record"
注意所有文件的位置应当相对于最后训练脚本的位置来决定 由于训练的文件放置在workspace/training_model下
在开始训练模型之前,我们先复制TensorFlow/models/research/object_detection/model_main_tf2.py 脚本并将其直接粘贴到我们的training_demo文件夹中。我们将需要此脚本来训练我们的模型。
执行下面训练脚本,注意该命令是shell的而非python
python model_main_tf2.py --model_dir=models/my_ssd_resnet50_v1_fpn --pipeline_config_path=models/my_ssd_resnet50_v1_fpn/pipeline.config
训练工作完成后,您需要提取新训练的推理图,该图将在以后用于执行对象检测。可以按照以下步骤进行操作:
复制[object detect path]/models/research/object_detection/exporter_main_v2.py
脚本并将其直接粘贴到您的training_demo
文件夹中。
进入training_demo
的文件夹,然后运行以下命令:
python3 exporter_main_v2.py --input_type image_tensor --pipeline_config_path models/my_ssd_resnet50_v1_fpn/pipeline.config --trained_checkpoint_dir ./models/my_ssd_resnet50_v1_fpn/ --output_directory ./exported-models/my_model
完成上述过程后,您应该my_model
在下方 找到一个training_demo/exported-models
具有以下结构的新文件夹:
training_demo/
├─ ...
├─ exported-models/
│ └─ my_model/
│ ├─ checkpoint/
│ ├─ saved_model/
│ └─ pipeline.config
└─ ...
然后可以使用该模型执行预测。
有一些注意事项
1、如果使用GPU版本需满足 tensorflow >= 2.5 虽然在官方的说明中要求>=2.4.0 但会出现cudnn的一些错误。领完GPU版本不能是docker版本,不然在使用物体检测api时tensorflow会被重新下载成cpu版
2、在tensorflow 2.4.1之后,对于cuda的版本要求是>=11.0,而且在安装过程也有一些坑,大家注意,具体内容不在本文讨论范围之内