运行TensorFlow报错ResourceExhaustedError: OOM when allocating tensor with shap * and type float *

1. 问题描述

运行代码时,程序异常退出,查看有如下日志:

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[10,17,17,192] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

[[node while/EnsAdvInceptionResnetV2/EnsAdvInceptionResnetV2/Repeat_1/block17_4/Branch_0/Conv2d_1x1/Conv2D (defined at /home/suy/.pyenv/versions/mypython3.6/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py:1057) = Conv2D[T=DT_FLOAT, data_format=“NHWC”, dilations=[1, 1, 1, 1], padding=“SAME”, strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](while/EnsAdvInceptionResnetV2/EnsAdvInceptionResnetV2/Repeat_1/block17_3/Relu, while/EnsAdvInceptionResnetV2/EnsAdvInceptionResnetV2/Repeat_1/block17_4/Branch_0/Conv2d_1x1/kernel/Regularizer/l2_regularizer/L2Loss/Enter)]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

2. 分析

经过检查日志,有ResourceExhaustedError 资源耗尽错误,查询相关资料,得知是tensor太大,GPU内存不够使用,从下面的日志中也可以看出来。

fused_batch_norm_op.cc:574 : Resource exhausted: OOM when allocating tensor with shape[10,17,17,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

日志也有如下输出,同样可以看出该问题:

2019-08-17 10:57:14.791207: I tensorflow/core/common_runtime/bfc_allocator.cc:647] Stats:
Limit: 7691812864
InUse: 7689161984
MaxInUse: 7691783424
NumAllocs: 16516
MaxAllocSize: 2151772160

3. 解决方法

从shape[10,17,17,128]可以看出,代码中使用的batch_size是10,改小为5,再次执行,运行正常,问题解决。

参考:tensorflow-issues-1993

你可能感兴趣的:(Python,深度学习)