背景:
session.run方法可以传入一个op,也可以传入op列表,例如,我们希望执行op1和op2,有两种写法:
sess.run(op1)
sess.run(op2)
和
sess.run([op1, op2])
如果op1和op2有相互包含关系,第写法1会将op1和op2的图各完整执行一遍,而写法2不会重复执行op1和op2中的公共部分。
例如:
# coding: utf-8 # 导入tensorflow import tensorflow as tf with tf.Session() as sess: base_path = '/Users/gandalf/WorkSpace/test/inputtexts/' # 3个文件 filename = [base_path + 'a.txt', base_path + 'b.txt', base_path + 'c.txt'] # 构造文件名队列,epoch内非随机排序,1个epoch filename_queue = tf.train.string_input_producer(filename, shuffle=False, num_epochs=1) # WholeFileReader每次读出的是一个文件的所有内容 reader = tf.WholeFileReader() key, value = reader.read(filename_queue) # tf.train.string_input_producer定义了一个epoch变量,要对它进行初始化 tf.local_variables_initializer().run() # 使用start_queue_runners之后,才会开始填充队列 coord = tf.train.Coordinator() # 为监测输入文件名队列是否处理结束 threads = tf.train.start_queue_runners(sess=sess, coord=coord) # 启动文件读取线程 try: while not coord.should_stop(): # 同时运行多个op key_data, txt_data = sess.run([key, value]) print "key:" + key_data print "value:\n" + txt_data print "-------------------------" except tf.errors.OutOfRangeError: print('Done reading, file queue ended.') finally: coord.request_stop() coord.join(threads) sess.close() 输出:
gandalfdeMacBook-Air:test gandalf$ python sessrun.multi.test.py
2018-08-19 20:01:30.853685: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
key:/Users/gandalf/WorkSpace/test/inputtexts/a.txt
value:
1 1 1
2 2 2
3 3 3
-------------------------
key:/Users/gandalf/WorkSpace/test/inputtexts/b.txt
value:
4 4 4
5 5 5
6 6 6
-------------------------
key:/Users/gandalf/WorkSpace/test/inputtexts/c.txt
value:
7 7 7
8 8 8
9 9 9
-------------------------
Done reading, file queue ended.
可见,3个输入文件在1个epoch中被顺利输出。
我们试试独立输入op:
# coding: utf-8 # 导入tensorflow import tensorflow as tf with tf.Session() as sess: base_path = '/Users/gandalf/WorkSpace/test/inputtexts/' # 3个文件 filename = [base_path + 'a.txt', base_path + 'b.txt', base_path + 'c.txt'] # 构造文件名队列,epoch内非随机排序,1个epoch filename_queue = tf.train.string_input_producer(filename, shuffle=False, num_epochs=1) # WholeFileReader每次读出的是一个文件的所有内容 reader = tf.WholeFileReader() key, value = reader.read(filename_queue) # tf.train.string_input_producer定义了一个epoch变量,要对它进行初始化 tf.local_variables_initializer().run() # 使用start_queue_runners之后,才会开始填充队列 coord = tf.train.Coordinator() # 为监测输入文件名队列是否处理结束 threads = tf.train.start_queue_runners(sess=sess, coord=coord) # 启动文件读取线程 try: while not coord.should_stop(): # 输入多个op # key_data, txt_data = sess.run([key, value]) # 输入单独op key_data = sess.run(key) txt_data = sess.run(value) print "key:" + key_data print "value:\n" + txt_data print "-------------------------" except tf.errors.OutOfRangeError: print('Done reading, file queue ended.') finally: coord.request_stop() coord.join(threads) sess.close() 输出:
gandalfdeMacBook-Air:test gandalf$ python sessrun.multi.test.py
2018-08-19 20:03:07.207565: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
key:/Users/gandalf/WorkSpace/test/inputtexts/a.txt
value:
4 4 4
5 5 5
6 6 6
-------------------------
Done reading, file queue ended.
原理:
第一轮循环:
执行key_data = sess.run(key)时就已经针对第一个文件a.txt执行完了一遍graph,所以打印的key_data是:
/Users/gandalf/WorkSpace/test/inputtexts/a.txt
执行txt_data = sess.run(value)后,已经针对第二个文件b.txt执行完了一遍graph,所以打印的txt_data是b文件的内容。
此时,进入第二轮循环:
当执行key_data = sess.run(key),处理完第三个文件c.txt。
当执行txt_data = sess.run(value)时,触发文件队列结束异常,跳出循环,结束代码执行。