TensorFlow1.x最佳实践:Dataset API+Keras Model+TF Train

前言

关于深度学习框架,主流的几个TensorFlow、PyTorch以及Keras都有所使用,由于在深度学习领域自己也只算个入门级选手,所以本文只从一个AI新手的角度去尝试分享一些使用框架编程的实践经验。至于标题最佳实践,那也纯粹有些哗众取宠之意,文章对于能够玩转各种框架API的大佬们,也许会贻笑大方。除此以外,本文相当于一个搬运工,并不讲解具体的使用细节,当然会推荐一些已经介绍的很好的文章,看完一定会有所收获。下面正式开始,希望能对大家有所帮助。

在三种框架的使用上,可能最难以上手使用的就是TensorFlow,毕竟在没有Eager Execution时,动态图的特性常常让人对网络调试摸不着头脑。Keras相对来说最容易上手,固定版式的代码,封装性极高,想要扩展对新手来说就有些难了。PyTorch在上手难易程度,扩展性方面都很棒,特别是 torch 张量可以即时看到,便于调试。

总的来说,如果想要从零到一的去写一个深度学习工程代码,我觉得PyTorch会相当的合适。但是我们往往是在别人的工作基础上进行改进,很多开源代码都是基于1.x版本的TensorFlow实现的,或者有些是在Keras基础上实现的,并非PyTorch,我们总不愿意去重新用PyTorch实现一遍,对于学术科研或许有些本末倒置了

我们的目的是快速的实现自己的想法,基于TensorFlow1.x版本的框架实现自己的idea,快速试错。深度学习任务代码的编写,着重解决:

  1. 数据集制作与读取;
  2. 网络模型的搭建,其中包含了自定义网络层等各种复杂操作;
  3. 训练模型的代码,其中也包括自定义损失函数,动态调整学习率等。

针对这三方面,分别有着对应极为适合的方式去实现。

实践方案

a. 数据读取

往往我们的数据集存储在磁盘是直接以 jpg 或者 png 图片的形式,可能几万几十万张不等,标签信息可能也是图片或者存储在 txt 文档中的数据等等。当然,如果将这些零散的数据整合成类似于 npz 或者 TFRecord 这种,也是一样的。对于从硬盘读取大数据量的训练数据,往往都是需要多线程不断加载进行的,内存大小受限,不可能一次性加载全部数据。

TensorFlow在Dataset API之前,大多都是使用 QueueRunner 去搞定这件事。有兴趣可以去研究,这里随便贴一篇文章。老实说,这样的API有些难用,编码复杂性高,容易出错,至少我在平时的编码中确实会遇到数据读取队列出错的问题。相反PyTorch的数据读取方式就显得非常简单,有面向对象编程的那种感觉。TensorFlow在1.3版本之后引入了全新的读取数据API,也就是Dataset API。总的来说,更加的简洁明了,编码难度降低了很多。同样,这里推荐一篇文章,TensorFlow全新的数据读取方式:Dataset API教程。着重可能需要关注磁盘大数据量的读取和对数据的处理。

贴一个我自己写的代码,用于读取磁盘30万张 jpg 图片和对应 txt 标签。

class XxxDataloader:

    def __init__(self, config):
        self.config = config
        self.mode = config.mode

        # 数据路径
        self.img_path = config.img_path
        self.image_names_path = config.image_names_path
        self.gt_file = config.gt_file

        # 图片数据
        self.img_raw_batch = None
        self.img_aug_batch = None
        # 标签数据
        self.gt_batch = None  # ground truth

        # ===========》开始处理 ===========》

        # 读取图像名称和标签,image_names存放的是全部训练数据的
        image_names, gt = self._read_img_and_gt(self.image_names_path, self.gt_file)

        # 创建dataset, dataset中的一个元素是(image_name,, gt)
        dataset = tf.data.Dataset.from_tensor_slices((image_names, pts1_coordinates, gt_h4ps))
        # 通过图片名读取图片数据,并对数据进行处理
        dataset = dataset.map(self._parse_function)
        # 此时dataset中的一个元素是(image_batch, label_batch)
        if config.shuffle:
            dataset = dataset.shuffle(config.buffersize)
        dataset = dataset.batch(config.batch_size).repeat(config.train_epoch)

        # 从头到尾读取一次的iterator
        iterator = dataset.make_one_shot_iterator()

        # 从iterator里取出一个样本
        self.img_raw_batch, self.img_aug_batch, self.gt_batch = iterator.get_next()

    def _parse_function(self, image_name, gt):
    	# 获取图片路径,图片所在路径名称都存在一个txt中
        image_path = tf.string_join([self.img_path, image_name])

        # 读取图片RGB三通道
        image = self._read_image(image_path, [self.img_h, self.img_w], channels=3)
       
        # 数据增强
        random_aug = tf.random_uniform([], 0, 1)
        image_aug = tf.cond(random_aug < self.config.aug_ratio, lambda: self._augment_image(image), lambda: image)

        # 归一化等其他操作
        ......

        return image, image_aug, gt

    def _read_img_and_gt(self, filenames_file, gt_file):
        """
        读取图像名称数据、起始坐标点和ground truth
        :param filenames_file: 保存数据名称文件
        :param gt_file: 标签
        :return: 图的名称、标签
        """
        ..........
        return img_array, gt_array

    def _read_image(self, image_path, out_size, channels=3):
        """
        读取图像,并且resize成指定大小
        :param image_path: 图片路径
        :param out_size: 输出尺寸
        :param channels:
        :return:
        """
        image = tf.image.decode_jpeg(tf.read_file(image_path), channels=channels)
        image = tf.cast(image, tf.float32)
        image = tf.image.resize_images(image, out_size, tf.image.ResizeMethod.AREA)
        return image

    def _augment_image(self, image, min_val=0, max_val=255):
        """
        数据增强
        :param image: 
        :return: 
        """
		image_aug = aug(image)
        return image_aug

代码着重讲究使用流程,可以自行修改定制。

b. Keras搭建模型

其实搭建模型这一块,各个框架都有非常方便的高层API,也都有各种张量操作函数。但是我们希望能各种自定义层,有时候希望某几层权重共享等等。如果使用TensorFlow原生API往往需要考虑各种 namescope,所以相对来说Keras更加适合搭建网络模型,还可以借助它的打印绘制网络结构方法,十分方便。关于搭建Keras模型,可以直接参考官方文档关于构建函数式和顺序式的模型部分内容。

这里也贴一个构建函数式模型的代码样例:

class XxxNet:

    def __init__(self, config):
        self.config = config
        self.model = self.build_model()

    def build_model(self):
        left_input = Input(shape=(self.config.patch_size, self.config.patch_size, 1), name='left_input')
        right_input = Input(shape=(self.config.patch_size, self.config.patch_size, 1), name='right_input')

        # concat
        stack_input = Concatenate(axis=3)([left_input, right_input])

        # block1
        conv1_1 = Conv2D(64, (3, 3), strides=(1, 1), padding='same', activation='relu')(stack_input)
        conv1_2 = Conv2D(64, (3, 3), strides=(1, 1), padding='same', activation='relu')(conv1_1)
        maxpooling1 = MaxPooling2D((2, 2), strides=(2, 2))(conv1_2)

        # block2
        conv2_1 = Conv2D(64, (3, 3), strides=(1, 1), padding='same', activation='relu')(maxpooling1)
        conv2_2 = Conv2D(64, (3, 3), strides=(1, 1), padding='same', activation='relu')(conv2_1)
        maxpooling2 = MaxPooling2D((2, 2), strides=(2, 2))(conv2_2)

        # block3
        conv3_1 = Conv2D(128, (3, 3), strides=(1, 1), padding='same', activation='relu')(maxpooling2)
        conv3_2 = Conv2D(128, (3, 3), strides=(1, 1), padding='same', activation='relu')(conv3_1)
        maxpooling3 = MaxPooling2D((2, 2), strides=(2, 2))(conv3_2)

        # block4
        conv4_1 = Conv2D(128, (3, 3), strides=(1, 1), padding='same', activation='relu')(maxpooling3)
        conv4_2 = Conv2D(128, (3, 3), strides=(1, 1), padding='same', activation='relu')(conv4_1)

        # dropout1
        if self.config.mode == "test":
            self.config.dropout_rate = 1.0
        dropout1 = Dropout(self.config.dropout_rate)(conv4_2)

        # flatten
        flatten = Flatten()(dropout1)

        # fc and dropout2
        fc1 = Dense(1024, activation='relu', kernel_initializer='random_uniform')(flatten)
        fc1_dropout = Dropout(self.config.dropout_rate)(fc1)

        fc2 = Dense(8, activation=None, kernel_initializer='random_uniform')(fc1_dropout)
        output = fc2

        model = Model([left_input, right_input], output)
        plot_model(model, to_file=os.path.join(self.config.model_img_dir, "_model.svg"), show_shapes=True)
        return model

c. TensorFlow API 训练

其实通过前面Dataset API+Keras模型的方式,我们已经完全可以编写Keras方式的训练代码,直接 model.fit() 等。如果有兴趣,可以参考:tensorflow的keras实现搭配dataset 之二。

但是,我们常常会遇到自定义损失函数,动态调整学习率的需求,诚然Keras也提供了对应的方式供我们扩展,但编码量并没有减少,反而有些失去了原本用Keras的方便性。我们不如直接使用TensorFlow原本的方式来训练模型,还可以随意的定义需要展示在Tensorboard中的量。

这里也贴出一份代码:

config = TrainConfig()  # 训练参数配置
dataloader = XxxDataloader(config)
train(config, dataloader, XxxNet(config))

def train(config, dataloader, network):
    gt_image = dataloader.img_gt_batch  # 标签
    pred_image = network.model(dataloader.img_batch)  # 预测
    l1_loss = tf.reduce_mean(tf.abs(pred_image - gt_image))  # metric
    dssim_loss = loss_mix_v3(gt_image, pred_image)  # 自定义损失
    
    op = tf.train.AdamOptimizer(learning_rate=config.learning_rate).minimize(dssim_loss)

    # 保存学习率/loss值至tensorboard
    with tf.device('/cpu:0'):
        with tf.name_scope('losses'):
            tf.summary.scalar('l1_loss', l1_loss)
            tf.summary.scalar('dssim_loss', dssim_loss)
        with tf.name_scope('images'):
            tf.summary.image('gt_image', gt_image, 1)
            tf.summary.image('pred_image', pred_image, 1)

    init = tf.global_variables_initializer()

    total_step = 0
    merged_summary_op = tf.summary.merge_all()

    with tf.Session() as sess:
        summary_writer = tf.summary.FileWriter(config.log_dir, sess.graph)
        
        sess.run(init)
        try:
            while True:
                total_step += 1
                _, l1_loss_output, dssim_loss_output = sess.run([op, l1_loss, dssim_loss])

                print("step: {:d}, l1_loss: {:.4f}, dssim_loss: {:.4f}".format(total_step, l1_loss_output, dssim_loss_output))

                if total_step % 100 == 0:
                    summary_str = sess.run(merged_summary_op)
                    summary_writer.add_summary(summary_str, total_step)

        except tf.errors.OutOfRangeError:
            print("end!")

Mnist样例

# coding-utf-8
from __future__ import absolute_import, division, print_function

import tensorflow as tf
from tensorflow.keras import Model, layers, Input
import numpy as np

# MNIST dataset parameters.
num_classes = 10  # total classes (0-9 digits).

# Training parameters.
learning_rate = 0.001
training_steps = 200
batch_size = 128
display_step = 10

# Network parameters.
conv1_filters = 32  # number of filters for 1st conv layer.
conv2_filters = 64  # number of filters for 2nd conv layer.
fc1_units = 1024  # number of neurons for 1st fully-connected layer.

# Prepare MNIST data.
from tensorflow.keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
# Convert to float32.
x_train, x_test = np.array(x_train, np.float32), np.array(x_test, np.float32)
x_train = np.reshape(x_train, [-1, 28, 28, 1])
x_test = np.reshape(x_test, [-1, 28, 28, 1])
# Normalize images value from [0, 255] to [0, 1].
x_train, x_test = x_train / 255., x_test / 255.

# Use tf.data API to shuffle and batch data.
train_data = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_data = train_data.repeat().shuffle(5000).batch(batch_size).prefetch(1)
iterator = train_data.make_one_shot_iterator()
batch_x, batch_y = iterator.get_next()

class ConvNetModel:
    def __init__(self):
        self.model = self.build_model()

    def build_model(self):
        input = Input(shape=(28, 28, 1))
        conv1 = layers.Conv2D(32, kernel_size=5, activation=tf.nn.relu)(input)
        maxpool1 = layers.MaxPool2D(2, strides=2)(conv1)
        conv2 = layers.Conv2D(64, kernel_size=3, activation=tf.nn.relu)(maxpool1)
        maxpool2 = layers.MaxPool2D(2, strides=2)(conv2)
        flatten = layers.Flatten()(maxpool2)
        fc1 = layers.Dense(1024)(flatten)
        dropout = layers.Dropout(rate=0.5)(fc1)
        out = layers.Dense(num_classes)(dropout)
        output = layers.Softmax()(out)
        model = Model(input, output)
        return model


# Cross-Entropy Loss.
# Note that this will apply 'softmax' to the logits.
def cross_entropy_loss(x, y):
    # Convert labels to int 64 for tf cross-entropy function.
    y = tf.cast(y, tf.int64)
    # Apply softmax to logits and compute cross-entropy.
    loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=x)
    # Average loss across the batch.
    return tf.reduce_mean(loss)


# Accuracy metric.
def accuracy(y_pred, y_true):
    # Predicted class is the index of highest score in prediction vector (i.e. argmax).
    correct_prediction = tf.equal(tf.argmax(y_pred, 1), tf.cast(y_true, tf.int64))
    return tf.reduce_mean(tf.cast(correct_prediction, tf.float32), axis=-1)


pred_x = ConvNetModel().model(batch_x)
loss = cross_entropy_loss(pred_x, batch_y)
acc = accuracy(pred_x, batch_y)
op = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)

init = tf.global_variables_initializer()
total_step = 0

with tf.Session() as sess:
    sess.run(init)
    try:
        while True:
            total_step += 1
            _, loss_val, accu = sess.run([op, loss, acc])
            print("step: {:d}, loss: {:.4f}, accuracy: {:.4f}".format(total_step, loss_val, accu))
    except tf.errors.OutOfRangeError:
        print("end!")

运行结果:

step: 1, loss: 2.3023, accuracy: 0.0859
step: 2, loss: 2.2705, accuracy: 0.2344
step: 3, loss: 2.2220, accuracy: 0.2578
step: 4, loss: 2.1468, accuracy: 0.5312
step: 5, loss: 2.0973, accuracy: 0.4062
step: 6, loss: 2.0508, accuracy: 0.4375
step: 7, loss: 2.0289, accuracy: 0.4453
step: 8, loss: 1.9723, accuracy: 0.5000
step: 9, loss: 2.0346, accuracy: 0.4297
step: 10, loss: 2.0422, accuracy: 0.4297
step: 11, loss: 2.0012, accuracy: 0.4609
step: 12, loss: 1.9324, accuracy: 0.5391
step: 13, loss: 1.8626, accuracy: 0.6172
step: 14, loss: 1.7589, accuracy: 0.7031
step: 15, loss: 1.8898, accuracy: 0.5703
step: 16, loss: 1.8466, accuracy: 0.6172
step: 17, loss: 1.8487, accuracy: 0.6172
step: 18, loss: 1.8434, accuracy: 0.6172
step: 19, loss: 1.7937, accuracy: 0.6875
step: 20, loss: 1.8203, accuracy: 0.6406
step: 21, loss: 1.8235, accuracy: 0.6484
step: 22, loss: 1.7536, accuracy: 0.7266
step: 23, loss: 1.7713, accuracy: 0.6953
step: 24, loss: 1.7537, accuracy: 0.7109
step: 25, loss: 1.7089, accuracy: 0.7578
step: 26, loss: 1.7639, accuracy: 0.7344
step: 27, loss: 1.7464, accuracy: 0.7344
step: 28, loss: 1.6542, accuracy: 0.8125
step: 29, loss: 1.6519, accuracy: 0.8203
step: 30, loss: 1.6816, accuracy: 0.7891
step: 31, loss: 1.6821, accuracy: 0.7891
step: 32, loss: 1.6792, accuracy: 0.8047
step: 33, loss: 1.6346, accuracy: 0.8438
step: 34, loss: 1.6700, accuracy: 0.7969
step: 35, loss: 1.6359, accuracy: 0.8281
step: 36, loss: 1.6499, accuracy: 0.8281
step: 37, loss: 1.7028, accuracy: 0.7656
step: 38, loss: 1.6700, accuracy: 0.8203
step: 39, loss: 1.6087, accuracy: 0.8672
step: 40, loss: 1.6350, accuracy: 0.8359
step: 41, loss: 1.6506, accuracy: 0.8125
step: 42, loss: 1.5777, accuracy: 0.8906
step: 43, loss: 1.7081, accuracy: 0.7578
step: 44, loss: 1.6700, accuracy: 0.8047
step: 45, loss: 1.6723, accuracy: 0.7969
step: 46, loss: 1.6433, accuracy: 0.8281
step: 47, loss: 1.6298, accuracy: 0.8359
step: 48, loss: 1.6164, accuracy: 0.8516
step: 49, loss: 1.6247, accuracy: 0.8438
step: 50, loss: 1.6171, accuracy: 0.8516
step: 51, loss: 1.6380, accuracy: 0.8281
step: 52, loss: 1.6719, accuracy: 0.7969
step: 53, loss: 1.6347, accuracy: 0.8281
step: 54, loss: 1.6705, accuracy: 0.7969
step: 55, loss: 1.6123, accuracy: 0.8516
step: 56, loss: 1.6299, accuracy: 0.8438
step: 57, loss: 1.5959, accuracy: 0.8594
step: 58, loss: 1.6347, accuracy: 0.8281
step: 59, loss: 1.6353, accuracy: 0.8359
step: 60, loss: 1.6121, accuracy: 0.8516
step: 61, loss: 1.6430, accuracy: 0.8125
step: 62, loss: 1.5951, accuracy: 0.8750
step: 63, loss: 1.5636, accuracy: 0.8984
step: 64, loss: 1.5945, accuracy: 0.8672
step: 65, loss: 1.5843, accuracy: 0.8750
step: 66, loss: 1.5627, accuracy: 0.9141
step: 67, loss: 1.5937, accuracy: 0.8750
step: 68, loss: 1.5360, accuracy: 0.9297
step: 69, loss: 1.5525, accuracy: 0.9141
step: 70, loss: 1.5528, accuracy: 0.9062
step: 71, loss: 1.5281, accuracy: 0.9453
step: 72, loss: 1.5312, accuracy: 0.9375
step: 73, loss: 1.5468, accuracy: 0.9141
step: 74, loss: 1.5668, accuracy: 0.8984
step: 75, loss: 1.5466, accuracy: 0.9219
step: 76, loss: 1.5354, accuracy: 0.9297
step: 77, loss: 1.5417, accuracy: 0.9141
step: 78, loss: 1.5656, accuracy: 0.8984
step: 79, loss: 1.5551, accuracy: 0.9062
step: 80, loss: 1.5206, accuracy: 0.9609
step: 81, loss: 1.5277, accuracy: 0.9375
step: 82, loss: 1.5434, accuracy: 0.9219
step: 83, loss: 1.5292, accuracy: 0.9297
step: 84, loss: 1.5626, accuracy: 0.8984
step: 85, loss: 1.5683, accuracy: 0.8906
step: 86, loss: 1.5686, accuracy: 0.8984
step: 87, loss: 1.5540, accuracy: 0.8984
step: 88, loss: 1.5302, accuracy: 0.9453
step: 89, loss: 1.5269, accuracy: 0.9453
step: 90, loss: 1.5437, accuracy: 0.9141
step: 91, loss: 1.5567, accuracy: 0.9062
step: 92, loss: 1.5158, accuracy: 0.9453
step: 93, loss: 1.5019, accuracy: 0.9688
step: 94, loss: 1.5278, accuracy: 0.9453
step: 95, loss: 1.5304, accuracy: 0.9453
step: 96, loss: 1.5344, accuracy: 0.9297
step: 97, loss: 1.5245, accuracy: 0.9375
step: 98, loss: 1.5200, accuracy: 0.9453
step: 99, loss: 1.5154, accuracy: 0.9531
step: 100, loss: 1.5135, accuracy: 0.9453

问题汇总

动态学习率

Keras的训练测试模式

保存读取模型

你可能感兴趣的:(深度学习)