记一次失败的tensorflow之旅(将feed_dict改为queue异步)

起因:众所周知,tensorflow有个慢的原因就是:

Feed_dict does a single-threaded memcpy of contents from Python runtime into TensorFlow runtime. If data is needed on GPU, then you'll have an additional CPU->GPU transfer. I'm used to seeing up to 10x improvement in performance when switching from feed_dict to native TensorFlow (Variable/Queue)

feed_dict performance

无责任机翻如下:

Feed_dict将Python单线程memcpy的内容转换为TensorFlow运行时。如果GPU需要数据,那么您将有一个额外的CPU->GPU传输。当从feed_dict切换到本机TensorFlow(变量/队列)时,我经常看到性能提高了10倍。

也就是从某种意思上,将feed_dict改成queue会变快。

然而,我们往往需要一边训练,一边验证,如果不提升了,就提前停止训练。

我不知道怎么在queue里同时训练又同时验证。也就是我遇到了和这个老哥一样的问题:

Train/dev setup. I believe it's a common practice to periodically train your model for some time, then evaluate on dev set, then repeat. I don't know how to do this with queues. With feed_dict, you just build two graphs with shared parameters under the same session, one for train and one for dev. When you evaluate on dev set, just feed dev data to dev graph, that's it. But for queue, output from queue is part of the graph itself. To run queue, you have to start the queue runner, create a coordinator, use this coordinator to manage the queue. When it's done, the queue is close!!!. Currently, I have no idea how to best write my code to conform the train/dev setup with queues except opening new session, build new graph for dev each time I evaluate. The same issue was raised here , and you can google for similar questions on Stackoverflow.

TensorFlow: Feeding data with queue vs with direct feeding with feed_dict

大意如下:他在做一个NMT,将feed_dict转为queue,需要1、将数据转为TFRecord,2,转回来。(其实这些都无所谓的,因为tf.train.batch支持传入一个placeholder)

致命的是第三步:你没法同时run两个。

这时候,我看到了这个链接: Periodically evaluating on a validation set,大概提了一下,可以尝试使用tf.cond做个控制流,但这些就违背了一开始的目的(恩,加快速度)

然后,我看到tensorflow官方开了一个issue:[Enhancement] Redesigning TensorFlow's input pipelines

大意:我们决定从头开始,重新设计输入pipeline的API。现有的方法将保留到TF 2.0。这个新api可以解决多次管道(用于多轮epoch),同时处理多个数据集(train和dev)。

恩,结束。失败的尝试,可能要等到tf2.0了,现在只能将就用着feed_dict,或者一些trick(然而对于我太耗时了)。

最后,这里的queue,也包括了tf.train.batch这些对queue的实现。


以下是当时查的一些链接,在此先表示感谢,他们虽然用上了管道提速了,却没看到解决我问题的方法。记下来以后可以回来翻一翻。

tensorflow使用笔记(3)--Reading data(1)

tensorflow队列操作详解

fully_connected_preloaded_var.py (这个教你怎么讲数据预读取到内存里,其程序范式值得学学,虽然没讲到怎么用验证集)

fully_connected_preloaded.py

TensorFlow: How to optimise your input pipeline with queues and multi-threading (值得看看)

Get 10x Speedup in Tensorflow Multi-Task Learning using Python Multiprocessing (有图)

Reading data with queue is even slower than using feed_dict (楼主的两个程序还是蛮简明扼要)


最后附上官方文档: Importing Data 

你可能感兴趣的:(记一次失败的tensorflow之旅(将feed_dict改为queue异步))