1、input_data.py
11-216aefeda7db> in get_files(file_dir, ratio)
44 print(n_train)
45
---> 46 tra_images = image_list[0:n_train]
47 tra_labels = label_list[0:n_train]
48 tra_labels = [int(float(i)) for i in tra_labels]
TypeError: slice indices must be integers or None or have an __index__ method
解决:n_train需要定义为int型,改成
tra_images = image_list[0:int(n_train)]
将BATCH_SIZE=2,变成3,就突然报错了
INFO:tensorflow:Error reported to Coordinator: 'tensorflow.python.framework.errors_impl.ResourceExhaustedError'>, OOM when allocating tensor with shape[469,469,3]
[[Node: Cast_7 = Cast[DstT=DT_FLOAT, SrcT=DT_UINT8, _device="/job:localhost/replica:0/task:0/gpu:0"](control_dependency_9)]]
done!
---------------------------------------------------------------------------
ResourceExhaustedError Traceback (most recent call last)
5-5c32a338b497> in <module>()
29 finally:
30 coord.request_stop()
---> 31 coord.join(threads)
/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.pyc in join(self, threads, stop_grace_period_secs, ignore_live_threads)
387 self._registered_threads = set()
388 if self._exc_info_to_raise:
--> 389 six.reraise(*self._exc_info_to_raise)
390 elif stragglers:
391 if ignore_live_threads:
/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/queue_runner_impl.pyc in _run(self, sess, enqueue_op, coord)
236 break
237 try:
--> 238 enqueue_callable()
239 except self._queue_closed_exception_types: # pylint: disable=catching-non-exception
240 # This exception indicates that a queue was closed.
/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _single_operation_run()
1061 with errors.raise_exception_on_not_ok_status() as status:
1062 tf_session.TF_Run(self._session, None, {}, [],
-> 1063 target_list_as_strings, status, None)
1064 return _single_operation_run
1065 elif isinstance(fetches, ops.Tensor):
/usr/lib/python2.7/contextlib.pyc in __exit__(self, type, value, traceback)
22 if type is None:
23 try:
---> 24 self.gen.next()
25 except StopIteration:
26 return
/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.pyc in raise_exception_on_not_ok_status()
464 None, None,
465 compat.as_text(pywrap_tensorflow.TF_Message(status)),
--> 466 pywrap_tensorflow.TF_GetCode(status))
467 finally:
468 pywrap_tensorflow.TF_DeleteStatus(status)
ResourceExhaustedError: OOM when allocating tensor with shape[469,469,3]
[[Node: Cast_7 = Cast[DstT=DT_FLOAT, SrcT=DT_UINT8, _device="/job:localhost/replica:0/task:0/gpu:0"](control_dependency_9)]]
内存溢出了,很奇怪的事,每个batch有2个就没事,3个就不行,找找怎么解决吧。
PS:改成1个也运行失败了。改回2个也失败了。
ValueError: Variable conv1/weights already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:
File "model.py", line 14, in inference
initializer=tf.truncated_normal_initializer(stddev=0.1, dtype=tf.float32))
File "" , line 1, in
train_logits = model.inference(train_batch, BATCH_SIZE, N_CLASSES)
File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2882, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
经过查询,和tf.Variable()与tf.get_variable()有关系,http://blog.csdn.net/u012436149/article/details/53696970有详细解释。
使用jupyter调试时,如果改变了程序,又没有初始化,接着运行就会出现重名,所以要重新运行。可能是少了向量的初始化,后续在看一看
总之,就是conv1/weights重名了
运行train.py
2017-12-20 11:24:59.519868: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-20 11:24:59.519910: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-20 11:24:59.519915: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-12-20 11:24:59.519919: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-20 11:24:59.519923: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-12-20 11:24:59.639131: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-12-20 11:24:59.639524: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX 1050
major: 6 minor: 1 memoryClockRate (GHz) 1.493
pciBusID 0000:01:00.0
Total memory: 1.95GiB
Free memory: 16.44MiB
2017-12-20 11:24:59.639542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0
2017-12-20 11:24:59.639546: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y
2017-12-20 11:24:59.639554: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0)
2017-12-20 11:24:59.640574: E tensorflow/stream_executor/cuda/cuda_driver.cc:924] failed to allocate 16.44M (17235968 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2017-12-20 11:24:59.676339: E tensorflow/stream_executor/cuda/cuda_driver.cc:924] failed to allocate 14.79M (15512576 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2017-12-20 11:24:59.676925: E tensorflow/stream_executor/cuda/cuda_driver.cc:924] failed to allocate 13.31M (13961472 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2017-12-20 11:24:59.677485: E tensorflow/stream_executor/cuda/cuda_driver.cc:924] failed to allocate 11.98M (12565504 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2017-12-20 11:24:59.678041: E tensorflow/stream_executor/cuda/cuda_driver.cc:924] failed to allocate 10.79M (11309056 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2017-12-20 11:24:59.678602: E tensorflow/stream_executor/cuda/cuda_driver.cc:924] failed to allocate 9.71M (10178304 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2017-12-20 11:24:59.679164: E tensorflow/stream_executor/cuda/cuda_driver.cc:924] failed to allocate 8.74M (9160704 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2017-12-20 11:25:11.154864: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 108.78MiB. Current allocation summary follows.
2017-12-20 11:25:11.154966: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (256): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.154994: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (512): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155022: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1024): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155078: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2048): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155104: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4096): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155128: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8192): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155171: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16384): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155196: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (32768): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155220: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (65536): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155241: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (131072): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155287: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (262144): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155311: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (524288): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155333: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1048576): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155358: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2097152): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155385: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4194304): Total Chunks: 1, Chunks in use: 0 7.06MiB allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155405: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8388608): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155425: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16777216): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155445: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (33554432): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155468: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (67108864): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155490: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (134217728): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155515: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (268435456): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-12-20 11:25:11.155542: I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 108.78MiB was 64.00MiB, Chunk State:
2017-12-20 11:25:11.155567: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a00000 of size 1280
2017-12-20 11:25:11.155586: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a00500 of size 256
2017-12-20 11:25:11.155603: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a00600 of size 256
2017-12-20 11:25:11.155618: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a00700 of size 256
2017-12-20 11:25:11.155634: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a00800 of size 256
2017-12-20 11:25:11.155651: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a00900 of size 256
2017-12-20 11:25:11.155668: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a00a00 of size 256
2017-12-20 11:25:11.155686: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a00b00 of size 256
2017-12-20 11:25:11.155703: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a00c00 of size 256
2017-12-20 11:25:11.155721: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a00d00 of size 512
2017-12-20 11:25:11.155738: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a00f00 of size 256
2017-12-20 11:25:11.155755: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a01000 of size 256
2017-12-20 11:25:11.155772: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a01100 of size 512
2017-12-20 11:25:11.155789: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a01300 of size 256
2017-12-20 11:25:11.155805: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a01400 of size 256
2017-12-20 11:25:11.155823: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a01500 of size 256
2017-12-20 11:25:11.155840: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a01600 of size 256
2017-12-20 11:25:11.155856: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a01700 of size 256
2017-12-20 11:25:11.155936: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a01800 of size 1792
2017-12-20 11:25:11.155954: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a01f00 of size 256
2017-12-20 11:25:11.155971: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a02000 of size 9216
2017-12-20 11:25:11.155987: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10206a04400 of size 256
2017-12-20 11:25:11.156005: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x10206a04500 of size 7402752
2017-12-20 11:25:11.156022: I tensorflow/core/common_runtime/bfc_allocator.cc:693] Summary of in-use Chunks by size:
2017-12-20 11:25:11.156045: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 17 Chunks of size 256 totalling 4.2KiB
2017-12-20 11:25:11.156065: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 512 totalling 1.0KiB
2017-12-20 11:25:11.156084: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1280 totalling 1.2KiB
2017-12-20 11:25:11.156103: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1792 totalling 1.8KiB
2017-12-20 11:25:11.156123: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 9216 totalling 9.0KiB
2017-12-20 11:25:11.156143: I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 17.2KiB
2017-12-20 11:25:11.156167: I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit: 17235968
InUse: 17664
MaxInUse: 17664
NumAllocs: 22
MaxAllocSize: 9216
2017-12-20 11:25:11.156198: W tensorflow/core/common_runtime/bfc_allocator.cc:277] *___________________________________________________________________________________________________
2017-12-20 11:25:11.156248: W tensorflow/core/framework/op_kernel.cc:1148] Resource exhausted: OOM when allocating tensor of shape [222784,128] and type float
2017-12-20 11:25:11.196823: E tensorflow/core/common_runtime/executor.cc:644] Executor failed to create kernel. Resource exhausted: OOM when allocating tensor of shape [222784,128] and type float
[[Node: local3/weights/Adam/Initializer/zeros = Const[_class=["loc:@local3/weights"], dtype=DT_FLOAT, value=Tensor<type: float shape: [222784,128] values: [0 0 0]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Traceback (most recent call last):
File "train.py", line 47, in
sess.run(tf.global_variables_initializer())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError:
看这种情况,像内存出了问题.
应该是GPU没有开:https://stackoverflow.com/questions/45404689/tensorflow-how-to-train-2-cnnindependent-on-2-gpus-cuda-error-out-of-memory-e
将sess = tf.Session()改为
sess_config = tf.ConfigProto()
sess_config.gpu_options.per_process_gpu_memory_fraction = 0.90
sess = tf.Session(config=sess_config)
运行后,又出现
failed to allocate 24.38M (25559040 bytes) from device: CUDA_ERROR_OUT_OF_ME
这里是因为没有指定GPU,在文件开始加上
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
再次运行后,成功,但是只显示了第一次的训练数据step0,可能是因为GPU不足以运行,但是GTX1050的显卡不应该如此。也有可能是显卡的选择不对,因为我的笔记本有两个显卡。
但是代码上面的错误已经解决了。
使用占位符和feed_dict,报错
Step 0, train loss = 0.69, train accuracy = 50.00%
Traceback (most recent call last):
File "train.py", line 82, in <module>
feed_dict={X:test_images, Y:test_labels})
NameError: name 'X' is not defined
因为X,Y无定义。
http://blog.csdn.net/m0_37324740/article/details/77803694
例:
X = tf.placeholder(dtype=tf.float32, shape=[None, 784], name='X_placeholder')
Y = tf.placeholder(dtype=tf.int32, shape=[None, 10], name='Y_placeholder')