NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Key v1 not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
出现这样的问题,大多是在使用时,checkpoint文件中的变量名和调用的文件名不匹配造成的。解决方法就是查看checkpoint文件中的变量名,将程序调用变量名修改为checkpoint文件中的变量名即可解决问题。下边具体讲如何查看checkpoint文件中的变量名、修改程序调用变量名
下边例子是《TensorFlow实战Google深度学习框架》中模型持久化的例子,同时也解决书中ch5 重命名加载的问题:
模型保存的代码为:
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import tensorflow as tf
# 保存计算两个变量和的模型
v1 = tf.Variable(tf.random_normal([1], stddev=1, seed=1))
v2 = tf.Variable(tf.random_normal([1], stddev=1, seed=1))
result = v1 + v2
init_op = tf.global_variables_initializer()
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(init_op)
saver.save(sess, "Saved_model/model.ckpt")
模型加载的代码为(模型全部加载):
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import tensorflow as tf
# 保存计算两个变量和的模型
v1 = tf.Variable(tf.random_normal([1], stddev=1, seed=1))
v2 = tf.Variable(tf.random_normal([1], stddev=1, seed=1))
result = v1 + v2
saver = tf.train.Saver()
# 加载保存的模型,加载全部模型
with tf.Session() as sess:
saver.restore(sess, "Saved_model/model.ckpt")
print(sess.run(result))
这段代码并不会出现问题,正常运行。
运行结果为:
模型加载(重命名变量) 代码:
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import tensorflow as tf
# tf.reset_default_graph()
# 声明变量
V1 = tf.Variable(tf.constant(1.0, shape=[1]), name="a1")
V2 = tf.Variable(tf.constant(2.0, shape=[1]), name="a2")
result = V1 + V2
saver = tf.train.Saver({"v1": V1, "v2": V2})
# 加载保存的模型,加载全部模型
with tf.Session() as sess:
saver.restore(sess, "Saved_model/model.ckpt")
print(sess.run(result))
运行这段代码时,会出现下述错误:
NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Key v1 not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
出现这样的问题是代码中:saver = tf.train.Saver({"v1": V1, "v2": V2})指定的变量名“v1”、“v2”与checkpoint文件中的变量名名称不符合。
运行下边代码,查看checkpoint文件中的变量名(具体请参考博文TensorFlow中查看checkpoint文件中的变量名和对应值):
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import os
from tensorflow.python import pywrap_tensorflow
model_dir = "Saved_model"
checkpoint_path = os.path.join(model_dir, "model.ckpt")
reader = pywrap_tensorflow.NewCheckpointReader(checkpoint_path)
var_to_shape_map = reader.get_variable_to_shape_map()
for key in var_to_shape_map:
print("tensor_name: ", key, end=' ')
print(reader.get_tensor(key))
运行结果为:
由运行结果可以看出,checkpoint文件的变量名是Variable和Variable_1,并不是v1和v2,所以将上述加载模型(重命名变量) 中saver = tf.train.Saver({"v1": V1, "v2": V2})的v1和v2分别改为Variable和Variable_1即可解决错误。
修改后代码为:
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import tensorflow as tf
# tf.reset_default_graph()
# 声明变量
V1 = tf.Variable(tf.constant(1.0, shape=[1]), name="a1")
V2 = tf.Variable(tf.constant(2.0, shape=[1]), name="a2")
result = V1 + V2
saver = tf.train.Saver({"Variable": V1, "Variable_1": V2})
# 加载保存的模型,加载全部模型
with tf.Session() as sess:
saver.restore(sess, "Saved_model/model.ckpt")
print(sess.run(result))
运行结果为: