猫狗大战遇到问题

1、input_data.py

11-216aefeda7db> in get_files(file_dir, ratio)
     44     print(n_train)
     45 
---> 46     tra_images = image_list[0:n_train]
     47     tra_labels = label_list[0:n_train]
     48     tra_labels = [int(float(i)) for i in tra_labels]

TypeError: slice indices must be integers or None or have an __index__ method

解决:n_train需要定义为int型,改成
tra_images = image_list[0:int(n_train)]

将BATCH_SIZE=2,变成3,就突然报错了

INFO:tensorflow:Error reported to Coordinator: 'tensorflow.python.framework.errors_impl.ResourceExhaustedError'>, OOM when allocating tensor with shape[469,469,3]
     [[Node: Cast_7 = Cast[DstT=DT_FLOAT, SrcT=DT_UINT8, _device="/job:localhost/replica:0/task:0/gpu:0"](control_dependency_9)]]
done!
---------------------------------------------------------------------------
ResourceExhaustedError                    Traceback (most recent call last)
5-5c32a338b497> in <module>()
     29     finally:
     30         coord.request_stop()
---> 31     coord.join(threads)

/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.pyc in join(self, threads, stop_grace_period_secs, ignore_live_threads)
    387       self._registered_threads = set()
    388       if self._exc_info_to_raise:
--> 389         six.reraise(*self._exc_info_to_raise)
    390       elif stragglers:
    391         if ignore_live_threads:

/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/queue_runner_impl.pyc in _run(self, sess, enqueue_op, coord)
    236           break
    237         try:
--> 238           enqueue_callable()
    239         except self._queue_closed_exception_types:  # pylint: disable=catching-non-exception
    240           # This exception indicates that a queue was closed.

/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _single_operation_run()
   1061         with errors.raise_exception_on_not_ok_status() as status:
   1062           tf_session.TF_Run(self._session, None, {}, [],
-> 1063                             target_list_as_strings, status, None)
   1064       return _single_operation_run
   1065     elif isinstance(fetches, ops.Tensor):

/usr/lib/python2.7/contextlib.pyc in __exit__(self, type, value, traceback)
     22         if type is None:
     23             try:
---> 24                 self.gen.next()
     25             except StopIteration:
     26                 return

/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.pyc in raise_exception_on_not_ok_status()
    464           None, None,
    465           compat.as_text(pywrap_tensorflow.TF_Message(status)),
--> 466           pywrap_tensorflow.TF_GetCode(status))
    467   finally:
    468     pywrap_tensorflow.TF_DeleteStatus(status)

ResourceExhaustedError: OOM when allocating tensor with shape[469,469,3]
     [[Node: Cast_7 = Cast[DstT=DT_FLOAT, SrcT=DT_UINT8, _device="/job:localhost/replica:0/task:0/gpu:0"](control_dependency_9)]]

内存溢出了,很奇怪的事,每个batch有2个就没事,3个就不行,找找怎么解决吧。

PS:改成1个也运行失败了。改回2个也失败了。

ValueError: Variable conv1/weights already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:

  File "model.py", line 14, in inference
    initializer=tf.truncated_normal_initializer(stddev=0.1, dtype=tf.float32))
  File "", line 1, in 
    train_logits = model.inference(train_batch, BATCH_SIZE, N_CLASSES)
  File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2882, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

经过查询,和tf.Variable()与tf.get_variable()有关系,http://blog.csdn.net/u012436149/article/details/53696970有详细解释。
使用jupyter调试时,如果改变了程序,又没有初始化,接着运行就会出现重名,所以要重新运行。可能是少了向量的初始化,后续在看一看
总之,就是conv1/weights重名了

运行train.py

2017-12-20 11:24:59.519868: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-20 11:24:59.519910: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-20 11:24:59.519915: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-12-20 11:24:59.519919: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-20 11:24:59.519923: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-12-20 11:24:59.639131: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-12-20 11:24:59.639524: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: GeForce GTX 1050
major: 6 minor: 1 memoryClockRate (GHz) 1.493
pciBusID 0000:01:00.0
Total memory: 1.95GiB
Free memory: 16.44MiB
2017-12-20 11:24:59.639542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 
2017-12-20 11:24:59.639546: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y 
2017-12-20 11:24:59.639554: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0)
2017-12-20 11:24:59.640574: E tensorflow/stream_executor/cuda/cuda_driver.cc:924] failed to allocate 16.44M (17235968 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2017-12-20 11:24:59.676339: E tensorflow/stream_executor/cuda/cuda_driver.cc:924] failed to allocate 14.79M (15512576 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2017-12-20 11:24:59.676925: E tensorflow/stream_executor/cuda/cuda_driver.cc:924] failed to allocate 13.31M (13961472 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2017-12-20 11:24:59.677485: E tensorflow/stream_executor/cuda/cuda_driver.cc:924] failed to allocate 11.98M (12565504 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2017-12-20 11:24:59.678041: E tensorflow/stream_executor/cuda/cuda_driver.cc:924] failed to allocate 10.79M (11309056 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2017-12-20 11:24:59.678602: E tensorflow/stream_executor/cuda/cuda_driver.cc:924] failed to allocate 9.71M (10178304 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2017-12-20 11:24:59.679164: E tensorflow/stream_executor/cuda/cuda_driver.cc:924] failed to allocate 8.74M (9160704 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2017-12-20 11:25:11.154864: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 108.78MiB.  Current allocation summary follows.
2017-12-20 11:25:11.154966: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (256):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.154994: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (512):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155022: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1024):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155078: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2048):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155104: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4096):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155128: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8192):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155171: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16384):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155196: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (32768):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155220: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (65536):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155241: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (131072):    Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155287: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (262144):    Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155311: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (524288):    Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155333: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1048576):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155358: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2097152):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155385: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4194304):   Total Chunks: 1, Chunks in use: 0 7.06MiB allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155405: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8388608):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155425: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16777216):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155445: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (33554432):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155468: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (67108864):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155490: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (134217728):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155515: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (268435456):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155542: I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 108.78MiB was 64.00MiB, Chunk State: 
2017-12-20 11:25:11.155567: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a00000 of size 1280
2017-12-20 11:25:11.155586: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a00500 of size 256
2017-12-20 11:25:11.155603: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a00600 of size 256
2017-12-20 11:25:11.155618: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a00700 of size 256
2017-12-20 11:25:11.155634: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a00800 of size 256
2017-12-20 11:25:11.155651: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a00900 of size 256
2017-12-20 11:25:11.155668: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a00a00 of size 256
2017-12-20 11:25:11.155686: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a00b00 of size 256
2017-12-20 11:25:11.155703: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a00c00 of size 256
2017-12-20 11:25:11.155721: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a00d00 of size 512
2017-12-20 11:25:11.155738: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a00f00 of size 256
2017-12-20 11:25:11.155755: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a01000 of size 256
2017-12-20 11:25:11.155772: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a01100 of size 512
2017-12-20 11:25:11.155789: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a01300 of size 256
2017-12-20 11:25:11.155805: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a01400 of size 256
2017-12-20 11:25:11.155823: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a01500 of size 256
2017-12-20 11:25:11.155840: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a01600 of size 256
2017-12-20 11:25:11.155856: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a01700 of size 256
2017-12-20 11:25:11.155936: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a01800 of size 1792
2017-12-20 11:25:11.155954: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a01f00 of size 256
2017-12-20 11:25:11.155971: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a02000 of size 9216
2017-12-20 11:25:11.155987: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a04400 of size 256
2017-12-20 11:25:11.156005: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x10206a04500 of size 7402752
2017-12-20 11:25:11.156022: I tensorflow/core/common_runtime/bfc_allocator.cc:693]      Summary of in-use Chunks by size: 
2017-12-20 11:25:11.156045: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 17 Chunks of size 256 totalling 4.2KiB
2017-12-20 11:25:11.156065: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 512 totalling 1.0KiB
2017-12-20 11:25:11.156084: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1280 totalling 1.2KiB
2017-12-20 11:25:11.156103: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1792 totalling 1.8KiB
2017-12-20 11:25:11.156123: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 9216 totalling 9.0KiB
2017-12-20 11:25:11.156143: I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 17.2KiB
2017-12-20 11:25:11.156167: I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats: 
Limit:                    17235968
InUse:                       17664
MaxInUse:                    17664
NumAllocs:                      22
MaxAllocSize:                 9216

2017-12-20 11:25:11.156198: W tensorflow/core/common_runtime/bfc_allocator.cc:277] *___________________________________________________________________________________________________
2017-12-20 11:25:11.156248: W tensorflow/core/framework/op_kernel.cc:1148] Resource exhausted: OOM when allocating tensor of shape [222784,128] and type float
2017-12-20 11:25:11.196823: E tensorflow/core/common_runtime/executor.cc:644] Executor failed to create kernel. Resource exhausted: OOM when allocating tensor of shape [222784,128] and type float
     [[Node: local3/weights/Adam/Initializer/zeros = Const[_class=["loc:@local3/weights"], dtype=DT_FLOAT, value=Tensor<type: float shape: [222784,128] values: [0 0 0]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Traceback (most recent call last):
  File "train.py", line 47, in 
    sess.run(tf.global_variables_initializer()) 
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 

看这种情况,像内存出了问题.
应该是GPU没有开:https://stackoverflow.com/questions/45404689/tensorflow-how-to-train-2-cnnindependent-on-2-gpus-cuda-error-out-of-memory-e

将sess = tf.Session()改为
sess_config = tf.ConfigProto()
sess_config.gpu_options.per_process_gpu_memory_fraction = 0.90
sess = tf.Session(config=sess_config)

运行后,又出现
failed to allocate 24.38M (25559040 bytes) from device: CUDA_ERROR_OUT_OF_ME
这里是因为没有指定GPU,在文件开始加上

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"

再次运行后,成功,但是只显示了第一次的训练数据step0,可能是因为GPU不足以运行,但是GTX1050的显卡不应该如此。也有可能是显卡的选择不对,因为我的笔记本有两个显卡。
但是代码上面的错误已经解决了。

使用占位符和feed_dict,报错

Step 0, train loss = 0.69, train accuracy = 50.00%
Traceback (most recent call last):
  File "train.py", line 82, in <module>
    feed_dict={X:test_images, Y:test_labels})
NameError: name 'X' is not defined

因为X,Y无定义。
http://blog.csdn.net/m0_37324740/article/details/77803694
例:
X = tf.placeholder(dtype=tf.float32, shape=[None, 784], name='X_placeholder')
Y = tf.placeholder(dtype=tf.int32, shape=[None, 10], name='Y_placeholder')

你可能感兴趣的:(tensorflow,python,深度学习)