关于多gpu测试的疑问

1.首先保存模型以及值:

import tensorflow as tf

input = tf.placeholder(tf.float32, [], 'input')
with tf.name_scope('hans'):

    weights = tf.get_variable('weights', [], tf.float32, tf.ones_initializer(tf.float32))#weights.name:weights:0
    
    ema = tf.train.ExponentialMovingAverage(0.5)
    
#     update = tf.assign(weights, 2)
    output = tf.add(input, weights)#output.name: hans/add:0 
    m_op = ema.apply([output])     #更新hans/add/ExponentialMovingAverage:0
    
    with tf.control_dependencies([m_op]):
        y = tf.identity(ema.average(output),'y')#y:0 是output的滑动平均值
        print(y.name)#hans/y:0

saver = tf.train.Saver()
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(ema.average(output)))#初值为0
    print(sess.run(y, {input: 19}))#此时output为9+1=10
    saver.save(sess, './checkpoint/model0.ckpt')#保存所有值

输出:

0.0
10.0

运行完以上代码后,output的值是不会被保存的,因为他依赖于一个placeholder,所以载入模型后直接sess.run(output)是会报错的;然而output的滑动变量是被保存下来的,可以直接通过ema.average(output)来调用。

2.然后加载模型,按照原代码进行加载

import tensorflow as tf
input = tf.placeholder(tf.float32, [], 'input')
with tf.name_scope('hans'):

    weights = tf.get_variable('weights', [], tf.float32, tf.ones_initializer(tf.float32))#weights.name:weights:0
    
    ema = tf.train.ExponentialMovingAverage(0.5)
    
#     update = tf.assign(weights, 2)
    output = tf.add(input, weights)#output.name: hans/add:0 
    m_op = ema.apply([output])     #更新hans/add/ExponentialMovingAverage:0
    
    with tf.control_dependencies([m_op]):
        y = tf.identity(ema.average(output),'y')#y:0 是output的滑动平均值
        print(y.name)#hans/y:0

saver = tf.train.Saver()
with tf.Session() as sess:
    saver.restore(sess, './checkpoint/model0.ckpt')
    print(sess.run(ema.average(output)))#值应为保存的值,及为10
    print(sess.run(y, {input: 100}))
    print(sess.run(ema.average(output)))#值已经更新

结果:

10.0
55.5
55.5

所以在模型的测试阶段,如果想用之前保存好的average值,就直接用ema.average调用,不要apply。

3.此外,name_scope也要注意如果在测试阶段将name_scope丢掉的话,由于之前保存的滑动平均值的名字是hans/add/ExponentialMovingAverage:0,丢到了name_scope('hans')的话,此时的output的滑动平均值的名字为add/ExponentialMovingAverage:0,那么在载入模型的时候,原来的值则找不到对应名字的变量,所以,如果这时候你直接载入值得话会报错,错误代码如下:

import tensorflow as tf
input = tf.placeholder(tf.float32, [], 'input')

weights = tf.get_variable('weights', [], tf.float32, tf.ones_initializer(tf.float32))#weights.name:weights:0
    
ema = tf.train.ExponentialMovingAverage(0.5)
    
#     update = tf.assign(weights, 2)
output = tf.add(input, weights)#output.name: hans/add:0 
m_op = ema.apply([output])     #更新hans/add/ExponentialMovingAverage:0
    
with tf.control_dependencies([m_op]):
    y = tf.identity(ema.average(output),'y')#y:0 是output的滑动平均值
    print(y.name)#hans/y:0

saver = tf.train.Saver()
with tf.Session() as sess:
    saver.restore(sess, './checkpoint/model0.ckpt')
    print(sess.run(ema.average(output)))#值应为保存的值,及为10
    print(sess.run(y, {input: 100}))
    print(sess.run(ema.average(output)))#值已经更新

报错信息:


修改代码的方法可以加上一开始的name_scope('hans'),也可以直接改变output的名字,则把相应代码修改为:

output = tf.add(input, weights,'hans/Add')

这样,载入的值就可以找到对应的变量。


4.总而言之,在多gpu训练后,测试网络时仍然需要重构网络,因为里面的滑动平均值会找不到对应的变量而报错,但是在多gpu训练是,各个tower上保存的用于batch_normalizaton的均值和方差的滑动变量各不相同,在测试时如何处理这些不同的滑动平均值呢?

你可能感兴趣的:(tensorflow)