Tensorflow:常见错误

 

Tensorflow SSE报错

TensorFlow wasn't compiled to use SSE (etc.) instructions, but these are available

解决:os.environ['TF_CPP_MIN_LOG_LEVEL']='2'

[TensorFlow wasn't compiled to use SSE (etc.) instructions, but these are available]

参数出错

beta = tf.Variable(beta0, name='beta') #不能这么写,l2计算就马上出错,变为nan

改为

beta = tf.Constant(beta0) 

 

0.12 Saver.restore broken? Unsuccessful TensorSliceReader constructor: Failed to find any matching files

saver.restore训练好的模型出错saver.restore(sess, checkpoint_path)

解决:

  1. use a model name without the character [],即要回复的模型文件名中不能含有[]
  2. when you tried to restore, use the full relative path ./model_epoch10 rather than model_epoch10
  3. FLAGS = tf.flags.FLAGS
    tf.flags.DEFINE_string("checkpoint_path","","...")
    后面如果再定义FLAGS.checkpoint_path = "./.../..."是无效的,也就是说FLAGS.***是不能赋值的!导致出错。

[0.12 Saver.restore broken?]

tensorflow.python.framework.errors_impl.NotFoundError: FindFirstFile failed for:...

checkpoint_path = tf.train.latest_checkpoint(checkpoint_path)

训练结果的文件夹居然要和恢复时文件所在文件夹一样(训练文件夹不能改名字)!不然就会报错。

RuntimeError: Coordinator stopped with threads still running: Thread-4

tf.contrib.slim.learning.train(..., saver=saver)  # 模型中间结果保存时出错

sv.saver.save(sess, sv.save_path, global_step=sv.global_step)

解决:1 use tf.reset_default_graph() to reset the graph instead of using with tf.Graph().as_default()

["Coordinator stopped with threads still running" happens when using cross-validation]

[coord.request_stop() doesn't stop the threads]

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[2580,200] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

error显示的是gpu0的内存耗尽了,可能是多用户时其它用户使用中,这时可以通过nvidia-smi查看gpu使用情况,并在代码中指定使用其它闲置的gpu如gpu1:os.environ["CUDA_VISIBLE_DEVICES"] = "1"。

from: -柚子皮-

ref:

 

你可能感兴趣的:(tensorflow)