最近测试Alexnet模型时遇到了一个问题:训练完成后想对多个图片进行检测,但是模型在计算出第一个图片后,再计算第二个就会出错(模型训练及测试代码参见:https://github.com/stephen-v/tensorflow_alexnet_classify):
OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key conv1_1/bias not found in checkpoint
参考网上说的,增加tf.reset_default_graph()(tf.reset_default_graph函数用于清除默认图形堆栈并重置全局默认图形)但是仍会出错,最后发现是定义Alexnet网络中的with tf.name_scope(‘xxx’) as scope 导致的。将with…as…结构删除后,再加上tf.reset_default_graph(),重新训练后再批量测试图片就没有问题了。
原来的Alexnet 模型代码部分示例代码:
import tensorflow as tf
def alexnet(x, keep_prob, num_classes):
# conv1
with tf.name_scope('conv1') as scope:
kernel = tf.Variable(tf.truncated_normal([11, 11, 3, 96], dtype=tf.float32,
stddev=1e-1), name='weights')
conv = tf.nn.conv2d(x, kernel, [1, 4, 4, 1], padding='SAME')
biases = tf.Variable(tf.constant(0.0, shape=[96], dtype=tf.float32),
trainable=True, name='biases')
bias = tf.nn.bias_add(conv, biases)
conv1 = tf.nn.relu(bias, name=scope)
# lrn1
with tf.name_scope('lrn1') as scope:
lrn1 = tf.nn.local_response_normalization(conv1,
alpha=1e-4,
beta=0.75,
depth_radius=2,
bias=2.0)
# pool1
with tf.name_scope('pool1') as scope:
pool1 = tf.nn.max_pool(lrn1,
ksize=[1, 3, 3, 1],
strides=[1, 2, 2, 1],
padding='VALID')
# 后面的省略...
修改后的代码为(把所有with tf.name_scope(‘xxx’) as scope去掉):
import tensorflow as tf
def alexnet(x, keep_prob, num_classes):
# conv1
kernel = tf.Variable(tf.truncated_normal([11, 11, 3, 96], dtype=tf.float32,
stddev=1e-1), name='weights')
conv = tf.nn.conv2d(x, kernel, [1, 4, 4, 1], padding='SAME')
biases = tf.Variable(tf.constant(0.0, shape=[96], dtype=tf.float32),
trainable=True, name='biases')
bias = tf.nn.bias_add(conv, biases)
conv1 = tf.nn.relu(bias, name = 'conv1')
# lrn1
# with tf.name_scope('lrn1') as scope:
lrn1 = tf.nn.local_response_normalization(conv1,
alpha=1e-4,
beta=0.75,
depth_radius=2,
bias=2.0)
# pool1
# with tf.name_scope('pool1') as scope:
pool1 = tf.nn.max_pool(lrn1,
ksize=[1, 3, 3, 1],
strides=[1, 2, 2, 1],
padding='VALID')
# 后面的省略...
批量测试代码为:
import tensorflow as tf
from alexnet import alexnet
import matplotlib.pyplot as plt
from os import walk, path
VGG_MEAN = tf.constant([123.68, 116.779, 103.939], dtype=tf.float32)
class_name = ['dog', 'cat']
def test_image(path_image, num_class):
img_string = tf.read_file(path_image)
img_decoded = tf.image.decode_png(img_string, channels=3)
img_resized = tf.image.resize_images(img_decoded, [224, 224])
# img_centered = tf.subtract(img_resized, VGG_MEAN)
img_resized = tf.reshape(img_resized, shape=[1, 224, 224, 3])
# img_bgr = img_centered[:, :, ::-1]
fc8 = alexnet(img_resized, 1, num_class)
score = tf.nn.softmax(fc8)
max = tf.argmax(score, 1)
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver.restore(sess, "./checkpoints/model_epoch80.ckpt")
print(sess.run(fc8))
prob = sess.run(max)[0]
output = class_name[prob]
plt.imshow(img_decoded.eval())
plt.title("Class:" + class_name[prob])
plt.show(2)
def get_path_prex(rootdir):
data_path = []
prefixs = []
for root, dirs, files in walk(rootdir, topdown=True):
for name in files:
pre, ending = path.splitext(name)
if ending != ".jpg" and ending != ".png":
continue
else:
data_path.append(path.join(root, name))
prefixs.append(pre)
return data_path, prefixs
img_path, prefix = get_path_prex('./Datasets/dog_cat/test/')
cnt_fire = 0
for i in range(len(img_path)):
tf.reset_default_graph() #这个要加上,每次测试前要重新建立图
output = test_image(img_path[i], num_class=2)
其中tf.reset_default_graph()是必须要增加的。
总结:
(1)由于训练代码中含有with tf.name_scope(‘xxx’) as scope去掉,导致直接增加tf.reset_default_graph()也仍报上述错误(参考:https://blog.csdn.net/LeeGe666/article/details/85806790)
(2)将代码中的with tf.name_scope(‘xxx’) as scope去掉后重新训练模型,并且在测试代码中增加tf.reset_default_graph() 错误解决。
参考链接: