如何使用Tensorflow保存或者加载模型(一)

1.背景

在深度学习的开源框架中，Tensorflow是最热门的框架之一，相信很多同学已经有了不同程度的学习和了解。但站长在平时的沟通发现，很多同学反应不知道怎么使用自己训练好的模型进行预测，不知道怎么继续接着之前训练了多个轮次的模型进行训练，不知道怎么生成工业化场景里可上线的模型文件等等。因此，站长会写一个针对Tensorflow的模型保存和加载的系列文章，为大家解决相关问题。

1.1 模型文件介绍

Tensorflow保存模型的时候会生成三个文件，分别是meta file，index file，data file。

meta file 这个文件是描述图结构，包括GraphDef, SaverDef等。值得注意的是，在Tensorfow中图和变量是分开的，关于图结构的信息主要保存在meta file中。
index file 这个文件是关于tensor的索引文件，key就是tensor的名字，value就是序列化后的BundleEntryProto。
data file 这个文件保存了所有变量的值。

Tensorflow的模型文件

1.2 模型的保存示例代码一

这里先简单地实现了一个例子，实现的是初始化了随机变量v1和v2，并将变量v1和v2相加获得变量v3。注意：这里保存模型的目录是result而不是model.ckpt。而这里的model.ckpt是模型文件的前缀，如果想要在同一个目录下保存多个模型的话，可以通过修改这个前缀达成目的。

# -*- coding: utf-8 -*-
import tensorflow as tf

v1 = tf.get_variable("v1", shape=[10], initializer=tf.random_normal_initializer)
v2 = tf.get_variable("v2", shape=[10], initializer=tf.random_normal_initializer)
v3 = tf.add(v1,v2, name="v3")

init_op = tf.global_variables_initializer()

saver = tf.train.Saver()

with tf.Session() as sess:
  print("Start initialing model parameters")
  sess.run(init_op)
  print("v1 : %s" % v1.eval())
  print("v2 : %s" % v2.eval())
  print("v3 : %s" % v3.eval())
  # Save the variables to disk.
  save_path = saver.save(sess, "./result/model.ckpt")
  print("Model saved in path: %s" % save_path)

1.3 模型的加载示例代码一

当模型已经输出后，我们就可以通过saver去加载所有的变量，并执行相关运算操作。注意，模型加载的路径是模型文件前缀是model.ckpt而不是其中一个文件名或者整个目录名

# -*- coding: utf-8 -*-
import tensorflow as tf

###2.Restore Variables###
tf.reset_default_graph()

# Create some variables.
v1 = tf.get_variable("v1", shape=[10])
v2 = tf.get_variable("v2", shape=[10])

# Add ops to save and restore all the variables.
saver = tf.train.Saver()

# Later, launch the model, use the saver to restore variables from disk, and
# do some work with the model.
with tf.Session() as sess:
  # Restore variables from disk.
  saver.restore(sess, "./result/model.ckpt")
  print("Model restored.")
  # Check the values of the variables
  print("v1 : %s" % v1.eval())
  print("v2 : %s" % v2.eval())

当然，如果你只希望加载部分的变量的时候，可以在创建Saver的时候，只传入部分变量。例如:

saver = tf.train.Saver([v1,v2])

如果你只想保存最后3个epochs的模型和每两个小时保存一个模型的话，可以这么设置。

saver = tf.train.Saver(max_to_keep=3, keep_checkpoint_every_n_hours=2)

1.4 模型的保存示例代码二

由于上面的示例比较简单，我们用一个线性回归的例子作为示例吧。

# -*- coding: utf-8 -*-

import tensorflow as tf
import numpy as np

##1.创建PlaceHolder和初始化参数##
X = tf.placeholder("float", name="X")
Y = tf.placeholder("float", name="Y")

W = tf.Variable(np.random.randn(), name= "W")
b = tf.Variable(np.random.randn(), name= "b")

learning_rate = 0.02
epochs = 100

data_x = np.linspace(0, 50, 50)
data_y = np.linspace(0, 50, 50)

##2.实现梯度下降##
y_pred = tf.add(tf.multiply(X, W), b, name="y_pred")
loss = tf.reduce_sum(tf.pow(y_pred-Y, 2)) / (2 * len(data_x))
opt = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
##初始化变量##
init = tf.global_variables_initializer()

##创建Saver##
saver = tf.train.Saver()

##3.构建Tensorflow Session##
with tf.Session() as sess:
    sess.run(init)

    for epoch in range(epochs):
        for (batch_x, batch_y) in zip(data_x, data_y):
            sess.run(opt, feed_dict={X: batch_x, Y: batch_y})

        if (epoch + 1) % 10 == 0:
            cost = sess.run(loss, feed_dict={X:data_x, Y:data_y})
            print("Epoch", (epoch + 1), ": cost =", cost, "W =", sess.run(W), "b =", sess.run(b))


    # 存储必须的变量#
    training_cost = sess.run(loss, feed_dict={X:data_x, Y:data_y})
    weight = sess.run(W)
    bias = sess.run(b)

    # 用变量进行预测#
    predictions = weight * X + bias
    print("Training Cost =", training_cost, "Weight =", weight, "bias =", bias, '\n')
    print("预测结果：", weight * 0.01 + bias)

    # 保存模型#
    save_path = saver.save(sess, "./result/model.ckpt", global_step=epochs)
    print("Model saved in path: %s" % save_path)

训练线性模型后的结果如下：

Epoch 10 : cost = 8.91454e-07 W = 0.99995387 b = 0.0023029544
Epoch 20 : cost = 8.285824e-07 W = 0.99995565 b = 0.0022180977
Epoch 30 : cost = 7.692257e-07 W = 0.9999573 b = 0.0021363394
Epoch 40 : cost = 7.133759e-07 W = 0.9999589 b = 0.0020576443
Epoch 50 : cost = 6.604228e-07 W = 0.9999603 b = 0.0019818414
Epoch 60 : cost = 6.1390205e-07 W = 0.99996185 b = 0.0019088342
Epoch 70 : cost = 5.691646e-07 W = 0.9999632 b = 0.0018385095
Epoch 80 : cost = 5.2784395e-07 W = 0.9999646 b = 0.0017707513
Epoch 90 : cost = 4.897609e-07 W = 0.9999659 b = 0.0017055188
Epoch 100 : cost = 4.5440865e-07 W = 0.99996716 b = 0.0016426622
Training Cost = 4.5440865e-07 Weight = 0.99996716 bias = 0.0016426622 

预测结果： 0.011642333795316517
Model saved in path: ./result/model.ckpt-100

Process finished with exit code 0

1.5 模型的加载示例代码二

读取本地模型文件，并开始预测新样本

# -*- coding: utf-8 -*-
import tensorflow as tf

with tf.Session() as sess:
    ##加载meta的图结构和权重
    ##这里要用meta文件的文件名而不是路径名
    saver = tf.train.import_meta_graph('./result/model.ckpt-100.meta')
    ##这里要用路径名
    saver.restore(sess, tf.train.latest_checkpoint('./result'))

    ##加载图
    graph = tf.get_default_graph()
    X = graph.get_tensor_by_name("X:0")
    ##输入数据点
    feed_dict = {X: 0.01}

    ##打印预测结果
    y_pred = graph.get_tensor_by_name("y_pred:0")
    print("预测结果是：", sess.run(y_pred, feed_dict=feed_dict))

加载模型后，预测的结果如下：

##与训练结束时打印的结果一致
预测结果是： 0.011642333

2.总结

本文介绍了基本的Tensorflow模型保存和加载方式和相应的示例，可是，如果公司要求要在Java环境下使用，要怎么做呢？下一篇文章会介绍一种新的模型保存和加载方法，会更简单，兼容性更强，支持Java调用。