Tensorflow:If you want to see a list of allocated tensors when OOM happens, add report_tensor_alloca

错误:ensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2,33,1024,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[Node: training/Adam/gradients/AddN_3-1-TransposeNHWCToNCHW-LayoutOptimizer = Transpose[T=DT_FLOAT, Tperm=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/batch_normalization_11/cond/FusedBatchNorm/Switch_grad/cond_grad, PermConstNHWCToNCHW-LayoutOptimizer)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[Node: loss/mul/_831 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_6298_loss/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

原因:batch_size设置较大,可以改小,同时电脑尽量关闭不必要页面。以保证代码运行有足够的内存。我的原因就是同时跑了两个大型网络,而且都用到两个GPU,所以出现此问题。

你可能感兴趣的:(Tensorflow:If you want to see a list of allocated tensors when OOM happens, add report_tensor_alloca)