本人windows10测试tensorflow-gpu的资源使用情况,开启两个tensorflow-gpu进程,两个进程的代码一致,第一个进程创建随机变量后gpu使用情况如下:
第二个进程创建随机变量时gpu使用情况如下:
可以看到已经快使用完了,这时我创建其他的变量时就报如下错误:
2019-11-27 22:19:32.105516: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2019-11-27 22:19:33.800747: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
2019-11-27 22:19:33.806029: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
---------------------------------------------------------------------------
UnknownError Traceback (most recent call last)
in ()
----> 1 out = layer(x)
D:\Program\Anaconda3\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py in __call__(self, inputs, *args, **kwargs)
889 with base_layer_utils.autocast_context_manager(
890 self._compute_dtype):
--> 891 outputs = self.call(cast_inputs, *args, **kwargs)
892 self._handle_activity_regularization(inputs, outputs)
893 self._set_mask_metadata(inputs, outputs, input_masks)
D:\Program\Anaconda3\lib\site-packages\tensorflow_core\python\keras\layers\convolutional.py in call(self, inputs)
195
196 def call(self, inputs):
--> 197 outputs = self._convolution_op(inputs, self.kernel)
198
199 if self.use_bias:
D:\Program\Anaconda3\lib\site-packages\tensorflow_core\python\ops\nn_ops.py in __call__(self, inp, filter)
1132 call_from_convolution=False)
1133 else:
-> 1134 return self.conv_op(inp, filter)
1135 # copybara:strip_end
1136 # copybara:insert return self.conv_op(inp, filter)
D:\Program\Anaconda3\lib\site-packages\tensorflow_core\python\ops\nn_ops.py in __call__(self, inp, filter)
637
638 def __call__(self, inp, filter): # pylint: disable=redefined-builtin
--> 639 return self.call(inp, filter)
640
641
D:\Program\Anaconda3\lib\site-packages\tensorflow_core\python\ops\nn_ops.py in __call__(self, inp, filter)
236 padding=self.padding,
237 data_format=self.data_format,
--> 238 name=self.name)
239
240
D:\Program\Anaconda3\lib\site-packages\tensorflow_core\python\ops\nn_ops.py in conv2d(input, filter, strides, padding, use_cudnn_on_gpu, data_format, dilations, name, filters)
2008 data_format=data_format,
2009 dilations=dilations,
-> 2010 name=name)
2011
2012
D:\Program\Anaconda3\lib\site-packages\tensorflow_core\python\ops\gen_nn_ops.py in conv2d(input, filter, strides, padding, use_cudnn_on_gpu, explicit_paddings, data_format, dilations, name)
1029 input, filter, strides=strides, use_cudnn_on_gpu=use_cudnn_on_gpu,
1030 padding=padding, explicit_paddings=explicit_paddings,
-> 1031 data_format=data_format, dilations=dilations, name=name, ctx=_ctx)
1032 except _core._SymbolicException:
1033 pass # Add nodes to the TensorFlow graph.
D:\Program\Anaconda3\lib\site-packages\tensorflow_core\python\ops\gen_nn_ops.py in conv2d_eager_fallback(input, filter, strides, padding, use_cudnn_on_gpu, explicit_paddings, data_format, dilations, name, ctx)
1128 explicit_paddings, "data_format", data_format, "dilations", dilations)
1129 _result = _execute.execute(b"Conv2D", 1, inputs=_inputs_flat, attrs=_attrs,
-> 1130 ctx=_ctx, name=name)
1131 _execute.record_gradient(
1132 "Conv2D", _inputs_flat, _attrs, _result, name)
D:\Program\Anaconda3\lib\site-packages\tensorflow_core\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
65 else:
66 message = e.message
---> 67 six.raise_from(core._status_to_exception(e.code, message), None)
68 except TypeError as e:
69 keras_symbolic_tensors = [
D:\Program\Anaconda3\lib\site-packages\six.py in raise_from(value, from_value)
UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [Op:Conv2D]
可以看到,这个时候资源已经使用完,cudnn已经处理不了了,我们关掉一个进程之后的gpu资源情况如下:
创建变量后没有在报错,gpu资源情况如下:
当然对于错误
UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [Op:Conv2D]
有人说设置--gpu_memory_fraction 0.5 有效,我没有实践过