在前面的文章 TensorFlow 训练 CNN 分类器 中我们已经学习了使用 TensorFlow 底层的函数来构建简单的 CNN 分类模型,但比较繁琐的是在定义 predict
函数时需要花费大量的代码先声明各层的权重和偏置,然后在搭建网络时还要不厌其烦的重复堆叠卷积、激活、池化等操作。本文介绍一种更方便构建神经网络模型的方法。
我们再次考虑文章 TensorFlow 训练 CNN 分类器 中的 10 分类任务,唯一的区别是我们希望用更简洁的代码来替换 predict
函数。这可以通过使用 tf.contrib.slim
模块来实现。
在 tf.contrib.slim
模块中卷积层的定义通过函数:
slim.conv2d(inputs,
num_outputs,
kernel_size,
stride=1,
padding='SAME',
data_format=None,
rate=1,
activation_fn=nn.relu,
normalizer_fn=None,
normalizer_params=None,
weights_initializer=initializers.xavier_initializer(),
weights_regularizer=None,
biases_initializer=init_ops.zeros_initializer(),
biases_regularizer=None,
reuse=None,
variables_collections=None,
outputs_collections=None,
trainable=True,
scope=None)
来实现,可以看到在这个函数中除了可以指定通常的卷积核大小 kernel_size
,填充方式 padding
,卷积步幅 stride
和 特征映射个数 num_outputs
等参数外,还可以指定权重和偏置的初始化方式、正则化方式和激活函数等。也就是说,使用 slim 模块来定义卷积层不需要事先额外声明权重和偏置变量,也不需要再额外的显式的进行激活和正则化操作,这些都已经在模块里内置了。
类似的,全连接层可以使用 slim.fully_connected
函数来定义。其它重要的操作包括池化、批标准化、dropout、平铺等也分别集成为了相应的函数 slim.max_pool2d, slim.batch_norm, slim.dropout, slim.flatten
等。更便利的是,如果要重复堆叠多个相同的层,则既可以用循环,比如要重复卷积层 3 次:
for i in range(3):
net = slim.conv2d(net, 256, [3, 3], scope='conv1_{}'.format(i))
来实现,也可以用更简单的函数:
net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv1')
来实现。函数:
slim.repeat(inputs, repetitions, layer, *args, **kwargs)
将使得构建大型神经网络变得更加紧凑和方便。以上这些函数的封装出现使得 TensorFlow 构建卷积神经网络的便捷性大大提高,甚至不输于 Keras。
回到我们在文章 TensorFlow 训练 CNN 分类器 中考虑过的 10 分类任务,在那篇文章的源代码 model.py 中我们花了大量的篇幅来构建一个包含 6 个卷积层和 3 个全连接层的小型 CNN 模型,现在我们可以用 tf.contrib.slim
模块来重写模型构建函数 predict
:
def predict(self, preprocessed_inputs):
"""Predict prediction tensors from inputs tensor.
Outputs of this function can be passed to loss or postprocess functions.
Args:
preprocessed_inputs: A float32 tensor with shape [batch_size,
height, width, num_channels] representing a batch of images.
Returns:
prediction_dict: A dictionary holding prediction tensors to be
passed to the Loss or Postprocess functions.
"""
net = preprocessed_inputs
net = slim.repeat(net, 2, slim.conv2d, 32, [3, 3], scope='conv1')
net = slim.max_pool2d(net, [2, 2], scope='pool1')
net = slim.repeat(net, 2, slim.conv2d, 64, [3, 3], scope='conv2')
net = slim.max_pool2d(net, [2, 2], scope='pool2')
net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv3')
net = slim.flatten(net, scope='flatten')
net = slim.dropout(net, keep_prob=0.5,
is_training=self._is_training)
net = slim.repeat(net, 2, slim.fully_connected, 512, scope='fc1')
net = slim.fully_connected(net, self.num_classes,
activation_fn=None, scope='fc2')
prediction_dict = {'logits': net}
return prediction_dict
显然,这看起来不仅快捷多了,还使得模型构建更加直观了,比起 Keras 有过之而无不及。做了这样重写之后,整个 model.py 文件如下:
# -*- coding: utf-8 -*-
"""
Created on Fri Mar 30 16:54:02 2018
@author: shirhe-lyh
"""
import tensorflow as tf
slim = tf.contrib.slim
class Model(object):
"""A simple 10-classification CNN model definition."""
def __init__(self,
is_training,
num_classes):
"""Constructor.
Args:
is_training: A boolean indicating whether the training version of
computation graph should be constructed.
num_classes: Number of classes.
"""
self._num_classes = num_classes
self._is_training = is_training
@property
def num_classes(self):
return self._num_classes
def preprocess(self, inputs):
"""Predict prediction tensors from inputs tensor.
Outputs of this function can be passed to loss or postprocess functions.
Args:
preprocessed_inputs: A float32 tensor with shape [batch_size,
height, width, num_channels] representing a batch of images.
Returns:
prediction_dict: A dictionary holding prediction tensors to be
passed to the Loss or Postprocess functions.
"""
preprocessed_inputs = tf.to_float(inputs)
preprocessed_inputs = tf.subtract(preprocessed_inputs, 128.0)
preprocessed_inputs = tf.div(preprocessed_inputs, 128.0)
return preprocessed_inputs
def predict(self, preprocessed_inputs):
"""Predict prediction tensors from inputs tensor.
Outputs of this function can be passed to loss or postprocess functions.
Args:
preprocessed_inputs: A float32 tensor with shape [batch_size,
height, width, num_channels] representing a batch of images.
Returns:
prediction_dict: A dictionary holding prediction tensors to be
passed to the Loss or Postprocess functions.
"""
with slim.arg_scope([slim.conv2d, slim.fully_connected],
activation_fn=tf.nn.relu):
net = preprocessed_inputs
net = slim.repeat(net, 2, slim.conv2d, 32, [3, 3], scope='conv1')
net = slim.max_pool2d(net, [2, 2], scope='pool1')
net = slim.repeat(net, 2, slim.conv2d, 64, [3, 3], scope='conv2')
net = slim.max_pool2d(net, [2, 2], scope='pool2')
net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv3')
net = slim.flatten(net, scope='flatten')
net = slim.dropout(net, keep_prob=0.5,
is_training=self._is_training)
net = slim.fully_connected(net, 512, scope='fc1')
net = slim.fully_connected(net, 512, scope='fc2')
net = slim.fully_connected(net, self.num_classes,
activation_fn=None, scope='fc3')
prediction_dict = {'logits': net}
return prediction_dict
def postprocess(self, prediction_dict):
"""Convert predicted output tensors to final forms.
Args:
prediction_dict: A dictionary holding prediction tensors.
**params: Additional keyword arguments for specific implementations
of specified models.
Returns:
A dictionary containing the postprocessed results.
"""
logits = prediction_dict['logits']
logits = tf.nn.softmax(logits)
classes = tf.cast(tf.argmax(logits, axis=1), dtype=tf.int32)
postprecessed_dict = {'classes': classes}
return postprecessed_dict
def loss(self, prediction_dict, groundtruth_lists):
"""Compute scalar loss tensors with respect to provided groundtruth.
Args:
prediction_dict: A dictionary holding prediction tensors.
groundtruth_lists: A list of tensors holding groundtruth
information, with one entry for each image in the batch.
Returns:
A dictionary mapping strings (loss names) to scalar tensors
representing loss values.
"""
logits = prediction_dict['logits']
loss = tf.reduce_mean(
tf.nn.sparse_softmax_cross_entropy_with_logits(
logits=logits, labels=groundtruth_lists))
loss_dict = {'loss': loss}
return loss_dict
上述代码除了声明 slim
模块:slim = tf.contrib.slim
是新加的,以及重写了 predict
函数之外,没有做其它任何修改。总的来说,代码量远远减少了。
由 tf.contrib.slim
模块定义的神经网络模型可以用两种不同的方式来训练,一种跟前文 TensorFlow 训练 CNN 分类器 的训练方式一样,另一种则继续借助 tf.contrib.slim
模块,使用 slim.learning.train
函数来快速的实现。这里我们继续使用 TensorFlow 训练 CNN 分类器 的训练方式,在后续的文章中我们将说明怎么用第二种方式来达到训练目的。
训练文件 train.py 如下,直接从前文复制过来没有做任何修改:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Fri Mar 30 19:27:44 2018
@author: shirhe-lyh
"""
"""Train a CNN model to classifying 10 digits.
Example Usage:
---------------
python3 train.py \
--images_path: Path to the training images (directory).
--model_output_path: Path to model.ckpt.
"""
import cv2
import glob
import numpy as np
import os
import tensorflow as tf
import model
flags = tf.app.flags
flags.DEFINE_string('images_path', None, 'Path to training images.')
flags.DEFINE_string('model_output_path', None, 'Path to model checkpoint.')
FLAGS = flags.FLAGS
def get_train_data(images_path):
"""Get the training images from images_path.
Args:
images_path: Path to trianing images.
Returns:
images: A list of images.
lables: A list of integers representing the classes of images.
Raises:
ValueError: If images_path is not exist.
"""
if not os.path.exists(images_path):
raise ValueError('images_path is not exist.')
images = []
labels = []
images_path = os.path.join(images_path, '*.jpg')
count = 0
for image_file in glob.glob(images_path):
count += 1
if count % 100 == 0:
print('Load {} images.'.format(count))
image = cv2.imread(image_file)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Assume the name of each image is imagexxx_label.jpg
label = int(image_file.split('_')[-1].split('.')[0])
images.append(image)
labels.append(label)
images = np.array(images)
labels = np.array(labels)
return images, labels
def next_batch_set(images, labels, batch_size=128):
"""Generate a batch training data.
Args:
images: A 4-D array representing the training images.
labels: A 1-D array representing the classes of images.
batch_size: An integer.
Return:
batch_images: A batch of images.
batch_labels: A batch of labels.
"""
indices = np.random.choice(len(images), batch_size)
batch_images = images[indices]
batch_labels = labels[indices]
return batch_images, batch_labels
def main(_):
inputs = tf.placeholder(tf.float32, shape=[None, 28, 28, 3], name='inputs')
labels = tf.placeholder(tf.int32, shape=[None], name='labels')
cls_model = model.Model(is_training=True, num_classes=10)
preprocessed_inputs = cls_model.preprocess(inputs)
prediction_dict = cls_model.predict(preprocessed_inputs)
loss_dict = cls_model.loss(prediction_dict, labels)
loss = loss_dict['loss']
postprocessed_dict = cls_model.postprocess(prediction_dict)
classes = postprocessed_dict['classes']
classes_ = tf.identity(classes, name='classes')
acc = tf.reduce_mean(tf.cast(tf.equal(classes, labels), 'float'))
global_step = tf.Variable(0, trainable=False)
learning_rate = tf.train.exponential_decay(0.05, global_step, 150, 0.9)
optimizer = tf.train.MomentumOptimizer(learning_rate, 0.9)
train_step = optimizer.minimize(loss, global_step)
saver = tf.train.Saver()
images, targets = get_train_data(FLAGS.images_path)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for i in range(6000):
batch_images, batch_labels = next_batch_set(images, targets)
train_dict = {inputs: batch_images, labels: batch_labels}
sess.run(train_step, feed_dict=train_dict)
loss_, acc_ = sess.run([loss, acc], feed_dict=train_dict)
train_text = 'step: {}, loss: {}, acc: {}'.format(
i+1, loss_, acc_)
print(train_text)
saver.save(sess, FLAGS.model_output_path)
if __name__ == '__main__':
tf.app.run()
在该文件的目录终端执行:
python3 train.py --images_path /home/.../datasets/images \
--model_output_path /home/.../model.ckpt
可以查看全部训练过程输出,比如:
step: 1, loss: 2.302396535873413, acc: 0.1640625
step: 2, loss: 2.302823066711426, acc: 0.0859375
step: 3, loss: 2.3024234771728516, acc: 0.1171875
step: 4, loss: 2.302684783935547, acc: 0.0546875
step: 5, loss: 2.3024277687072754, acc: 0.109375
step: 6, loss: 2.3024179935455322, acc: 0.0859375
step: 7, loss: 2.302734851837158, acc: 0.0703125
step: 8, loss: 2.3025729656219482, acc: 0.0859375
step: 9, loss: 2.3026342391967773, acc: 0.1171875
step: 10, loss: 2.3026227951049805, acc: 0.1171875
step: 11, loss: 2.3024468421936035, acc: 0.0859375
step: 12, loss: 2.302351236343384, acc: 0.140625
step: 13, loss: 2.302664279937744, acc: 0.1015625
step: 14, loss: 2.302532434463501, acc: 0.1171875
step: 15, loss: 2.3025684356689453, acc: 0.1015625
step: 16, loss: 2.302473306655884, acc: 0.0703125
step: 17, loss: 2.30285382270813, acc: 0.078125
step: 18, loss: 2.302445411682129, acc: 0.0859375
step: 19, loss: 2.302391290664673, acc: 0.0859375
step: 20, loss: 2.3027210235595703, acc: 0.109375
···
step: 5981, loss: 1.4615014791488647, acc: 1.0
step: 5982, loss: 1.46712064743042, acc: 1.0
step: 5983, loss: 1.4673535823822021, acc: 1.0
step: 5984, loss: 1.46533203125, acc: 0.9921875
step: 5985, loss: 1.4692511558532715, acc: 0.9921875
step: 5986, loss: 1.4615371227264404, acc: 1.0
step: 5987, loss: 1.461196780204773, acc: 1.0
step: 5988, loss: 1.4663658142089844, acc: 1.0
step: 5989, loss: 1.467726707458496, acc: 0.9921875
step: 5990, loss: 1.4727323055267334, acc: 0.9921875
step: 5991, loss: 1.461942434310913, acc: 1.0
step: 5992, loss: 1.461172342300415, acc: 1.0
step: 5993, loss: 1.4619064331054688, acc: 1.0
step: 5994, loss: 1.466255784034729, acc: 0.9921875
step: 5995, loss: 1.4612611532211304, acc: 1.0
step: 5996, loss: 1.4613593816757202, acc: 1.0
step: 5997, loss: 1.4761428833007812, acc: 0.984375
step: 5998, loss: 1.4681826829910278, acc: 0.9921875
step: 5999, loss: 1.4703295230865479, acc: 0.9921875
step: 6000, loss: 1.4703948497772217, acc: 0.9921875
根据以上输出发现,准确率已经稳定在 99% 以上,而损失则稳定在 1.46-1.47之间(可以与之后使用 tf.contrib.slim
模块的训练结果做比较,两者应该相差不大)。
从前文将测试代码 evaluate.py 也复制过来:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Mon Apr 2 14:02:05 2018
@author: shirhe-lyh
"""
import numpy as np
import tensorflow as tf
from captcha.image import ImageCaptcha
flags = tf.app.flags
flags.DEFINE_string('model_ckpt_path', None, 'Path to model checkpoint.')
FLAGS = flags.FLAGS
def generate_captcha(text='1'):
capt = ImageCaptcha(width=28, height=28, font_sizes=[24])
image = capt.generate_image(text)
image = np.array(image, dtype=np.uint8)
return image
def main(_):
with tf.Session() as sess:
ckpt_path = FLAGS.model_ckpt_path
saver = tf.train.import_meta_graph(ckpt_path + '.meta')
saver.restore(sess, ckpt_path)
inputs = tf.get_default_graph().get_tensor_by_name('inputs:0')
classes = tf.get_default_graph().get_tensor_by_name('classes:0')
for i in range(10):
label = np.random.randint(0, 10)
image = generate_captcha(str(label))
image_np = np.expand_dims(image, axis=0)
predicted_label = sess.run(classes,
feed_dict={inputs: image_np})
print(predicted_label, ' vs ', label)
if __name__ == '__main__':
tf.app.run()
执行
python3 evaluate.py --model_ckpt_path /home/.../model.ckpt
一睹训练好的模型风采,比如我执行的其中两次输出为:
[0] vs 0
[6] vs 6
[7] vs 7
[8] vs 8
[2] vs 2
[4] vs 4
[3] vs 3
[5] vs 5
[7] vs 7
[8] vs 8
[3] vs 3
[6] vs 6
[2] vs 2
[2] vs 2
[2] vs 2
[7] vs 7
[9] vs 9
[6] vs 6
[4] vs 4
[5] vs 5
效果还是可以的。