tensorfllow-gpu遇到gpu资源不够的情况

本人windows10测试tensorflow-gpu的资源使用情况,开启两个tensorflow-gpu进程,两个进程的代码一致,第一个进程创建随机变量后gpu使用情况如下:

tensorfllow-gpu遇到gpu资源不够的情况_第1张图片

第二个进程创建随机变量时gpu使用情况如下:

 tensorfllow-gpu遇到gpu资源不够的情况_第2张图片

 可以看到已经快使用完了,这时我创建其他的变量时就报如下错误:

2019-11-27 22:19:32.105516: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2019-11-27 22:19:33.800747: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
2019-11-27 22:19:33.806029: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
---------------------------------------------------------------------------
UnknownError                              Traceback (most recent call last)
 in ()
----> 1 out = layer(x)

D:\Program\Anaconda3\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py in __call__(self, inputs, *args, **kwargs)
    889           with base_layer_utils.autocast_context_manager(
    890               self._compute_dtype):
--> 891             outputs = self.call(cast_inputs, *args, **kwargs)
    892           self._handle_activity_regularization(inputs, outputs)
    893           self._set_mask_metadata(inputs, outputs, input_masks)

D:\Program\Anaconda3\lib\site-packages\tensorflow_core\python\keras\layers\convolutional.py in call(self, inputs)
    195
    196   def call(self, inputs):
--> 197     outputs = self._convolution_op(inputs, self.kernel)
    198
    199     if self.use_bias:

D:\Program\Anaconda3\lib\site-packages\tensorflow_core\python\ops\nn_ops.py in __call__(self, inp, filter)
   1132           call_from_convolution=False)
   1133     else:
-> 1134       return self.conv_op(inp, filter)
   1135     # copybara:strip_end
   1136     # copybara:insert return self.conv_op(inp, filter)

D:\Program\Anaconda3\lib\site-packages\tensorflow_core\python\ops\nn_ops.py in __call__(self, inp, filter)
    637
    638   def __call__(self, inp, filter):  # pylint: disable=redefined-builtin
--> 639     return self.call(inp, filter)
    640
    641

D:\Program\Anaconda3\lib\site-packages\tensorflow_core\python\ops\nn_ops.py in __call__(self, inp, filter)
    236         padding=self.padding,
    237         data_format=self.data_format,
--> 238         name=self.name)
    239
    240

D:\Program\Anaconda3\lib\site-packages\tensorflow_core\python\ops\nn_ops.py in conv2d(input, filter, strides, padding, use_cudnn_on_gpu, data_format, dilations, name, filters)
   2008                            data_format=data_format,
   2009                            dilations=dilations,
-> 2010                            name=name)
   2011
   2012

D:\Program\Anaconda3\lib\site-packages\tensorflow_core\python\ops\gen_nn_ops.py in conv2d(input, filter, strides, padding, use_cudnn_on_gpu, explicit_paddings, data_format, dilations, name)
   1029             input, filter, strides=strides, use_cudnn_on_gpu=use_cudnn_on_gpu,
   1030             padding=padding, explicit_paddings=explicit_paddings,
-> 1031             data_format=data_format, dilations=dilations, name=name, ctx=_ctx)
   1032       except _core._SymbolicException:
   1033         pass  # Add nodes to the TensorFlow graph.

D:\Program\Anaconda3\lib\site-packages\tensorflow_core\python\ops\gen_nn_ops.py in conv2d_eager_fallback(input, filter, strides, padding, use_cudnn_on_gpu, explicit_paddings, data_format, dilations, name, ctx)
   1128   explicit_paddings, "data_format", data_format, "dilations", dilations)
   1129   _result = _execute.execute(b"Conv2D", 1, inputs=_inputs_flat, attrs=_attrs,
-> 1130                              ctx=_ctx, name=name)
   1131   _execute.record_gradient(
   1132       "Conv2D", _inputs_flat, _attrs, _result, name)

D:\Program\Anaconda3\lib\site-packages\tensorflow_core\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     65     else:
     66       message = e.message
---> 67     six.raise_from(core._status_to_exception(e.code, message), None)
     68   except TypeError as e:
     69     keras_symbolic_tensors = [

D:\Program\Anaconda3\lib\site-packages\six.py in raise_from(value, from_value)

UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [Op:Conv2D]

可以看到,这个时候资源已经使用完,cudnn已经处理不了了,我们关掉一个进程之后的gpu资源情况如下:

tensorfllow-gpu遇到gpu资源不够的情况_第3张图片

 创建变量后没有在报错,gpu资源情况如下:

tensorfllow-gpu遇到gpu资源不够的情况_第4张图片

当然对于错误

UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [Op:Conv2D]

有人说设置--gpu_memory_fraction 0.5 有效,我没有实践过

你可能感兴趣的:(tensorflow,python)