Tesorflow问题总结

【写在前面】
记录使用Tensorflow过程中遇到的问题及解决方案,有些问题尚未解决,主要应用场景为训练模型,云端部署模型及移动端部署模型。

import tensorflow as tf

1 tf.argmax越界

【Demo】

import tensorflow as tf
v1 = tf.get_variable("v1", shape=[1, 10], initializer=tf.truncated_normal_initializer(stddev=0.1))
y = tf.argmax(v1, 1)
with tf.Session() as sess:
	init_op = tf.global_variables_initializer()
	sess.run(init_op)
	print("v1 value: {}".format(v1))
	print("v1 value: {}".format(sess.run(v1)))
	print("Compare result: {}".format(y))
	print("Compare result: {}".format(sess.run(y)))

【Problem】

InvalidArgumentError (see above for traceback): Expected dimension in the range [-1, 1), but got 1
	 [[Node: ArgMax = ArgMax[T=DT_DOUBLE, Tidx=DT_INT32, output_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ArgMax/input, ArgMax/dimension)]]

【Reason】
argmax函数参数越界,[-1, 1)
【Solve】
参数改为-1

1.2 测试

【Demo】

v1 = tf.get_variable("v1", shape=[1, 10], initializer=tf.truncated_normal_initializer(stddev=0.1))
y = tf.argmax(v1, -1)
with tf.Session() as sess:
	init_op = tf.global_variables_initializer()
	sess.run(init_op)
	print("v1 value: {}".format(v1))
	print("v1 value: {}".format(sess.run(v1)))
	print("Compare result: {}".format(y))
	print("Compare result: {}".format(sess.run(y)))

【Result】

v1 value: 
v1 value: [[ 0.00457896 -0.15347771 -0.02465441  0.00503899  0.03228834  0.15578857
   0.06421228  0.04700307 -0.05741185  0.0248824 ]]
Compare result: Tensor("ArgMax:0", shape=(1,), dtype=int64)
Compare result: [5]

2 Tensorflow与Python版本匹配

【Problem】

RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6

【Solve】
没办法,降低Python版本,使用Python3.5。

3 变量重复

【Problem】

Variable conv_w1 already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:

  File "", line 58, in init_conv_weights
    weights = tf.get_variable(name=name, shape=shape, dtype=tf.float32, initializer=tf.contrib.layers.xavier_initializer_conv2d())
  File "", line 101, in Inference
    cw_1 = init_conv_weights([3, 3, 3, 16], name='conv_w1')
  File "", line 42, in 
    class Inference(object):

【Resson】
变量重复,需要重建计算图结构(Graph)
【Solve】
文件开始,加入重置Tensorflow计算图的配置语句

import tensorflow as tf
tf.reset_default_graph()

4 LSTM神经网络更新

【Problem】

WARNING:tensorflow:From /home/SP-in-AI/xindq/couplets/seq2seq.py:12: BasicLSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This class is deprecated, please use tf.nn.rnn_cell.LSTMCell, which supports all the feature this cell currently has. Please replace the existing code with tf.nn.rnn_cell.LSTMCell(name='basic_lstm_cell').
2019-01-24 17:33:37.439645: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

【Resson】
Tensorflow版本更新,未来版本支持tf.nn.rnn_cell.LSTMCell(name='basic_lstm_cell')
【Solve】
rnn.BasicLSTMCell更换为tf.nn.rnn_cell.LSTMCell(name='basic_lstm_cell')

import tensorflow as tf
tf.nn.rnn_cell.LSTMCell(name='basic_lstm_cell')

5 seq2seq运行队列错误

【Problem】

 seq2seq can't pickle _thread.RLock objects

【Resson】
我也不清楚
【Solve】
seq2seq.py模型中添加:

setattr(tf.contrib.rnn.GRUCell, '__deepcopy__', lambda self, _: self)
setattr(tf.contrib.rnn.BasicLSTMCell, '__deepcopy__', lambda self, _: self)
setattr(tf.contrib.rnn.MultiRNNCell, '__deepcopy__', lambda self, _: self) 

6 tensorflow运行graph为空

【Problem】

RuntimeError: The Session graph is empty.  Add operations to the graph before calling run()

【Resson】

【Solve】

7 加载模型Tensor错误

【Problem】

ValueError: Fetch argument  cannot be interpreted as a Tensor. (Tensor Tensor("model_with_buckets/sequence_loss/truediv:0", shape=(), dtype=float32) is not an element of this graph.)

【Resson】

【Solve】

8 模型路径问题

【Problem】

The passed save_path is not a valid checkpoint

【Resson】
模型名称或路径不正确
【Solve】
模型路径填写到后缀以前的全部内容.如:模型名称:test_chatbot.ckpt-250.meta,使用模型名称为:test_chatbot.ckpt-250

9 载入模型无变量载入

【Problem】

 File "/home/xdq/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1139, in _build
    raise ValueError("No variables to save")
ValueError: No variables to save

【Resson】
保存命令前未设置变量.
【Solve】
保存命令前要设置变量.

10 计算出错

【Problem】

ZeroDivisionError: float division by zero
  • Reason
    被除数为0

12 后台载入模型出错

【Problem】

raise ValueError("Tensor %s is not an element of this graph." % obj)
ValueError: Tensor Tensor("out_2/truediv:0", shape=(?, ?, 5990), dtype=float32) is not an element of this graph

【Resson】
后台路由中载入模型,启动后台服务时会加载一次模型,当调用该服务时,加载模型失效。
【Solve】
设定全局载入模型,即:

loadmodel
route 
loadmodel

[参考文献]
[1]https://stackoverflow.com/questions/44855603/typeerror-cant-pickle-thread-lock-objects-in-seq2seq
[2]http://www.cnblogs.com/yanjj/p/8242595.html


更新ing

你可能感兴趣的:(问题总结,Tensorflow)