问题描述
系统:Ubuntu20.04
显卡:RTX A5000
物理机CUDA Verion:11.4
所安装的Tensorflow版本:tensorflow-gpu==1.13.1
采用Conda创建了虚拟环境以及拉取Tensorflow官方docker镜像两种方式,
其中Conda虚拟环境配置(Python3.7.2),使用conda命令安装,如下所示:
cudatoolkit 10.0.130 h8c5a6a4_10 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
cudnn 7.6.5.32 ha8d7eb6_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
事实证明tensorflow-gpu==1.13.1是可以在该虚拟环境下运行成功:
>>> tf.test.is_gpu_available()
2022-10-27 10:52:22.010425: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2022-10-27 10:52:22.059681: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3400000000 Hz
2022-10-27 10:52:22.063477: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55c559aae5b0 executing computations on platform Host. Devices:
2022-10-27 10:52:22.063507: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): ,
2022-10-27 10:52:22.066212: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2022-10-27 10:52:22.399495: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55c55ba1eed0 executing computations on platform CUDA. Devices:
2022-10-27 10:52:22.399552: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): NVIDIA RTX A5000, Compute Capability 8.6
2022-10-27 10:52:22.399570: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (1): NVIDIA RTX A5000, Compute Capability 8.6
2022-10-27 10:52:22.402277: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: NVIDIA RTX A5000 major: 8 minor: 6 memoryClockRate(GHz): 1.695
pciBusID: 0000:73:00.0
2022-10-27 10:52:22.402818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 1 with properties:
name: NVIDIA RTX A5000 major: 8 minor: 6 memoryClockRate(GHz): 1.695
pciBusID: 0000:a6:00.0
2022-10-27 10:52:22.403181: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2022-10-27 10:52:22.405501: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2022-10-27 10:52:22.407398: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2022-10-27 10:52:22.407895: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2022-10-27 10:52:22.410588: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2022-10-27 10:52:22.412655: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2022-10-27 10:52:22.417214: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2022-10-27 10:52:22.420181: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0, 1
2022-10-27 10:52:22.420233: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2022-10-27 10:52:22.421823: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-10-27 10:52:22.421843: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 1
2022-10-27 10:52:22.421851: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N Y
2022-10-27 10:52:22.421857: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 1: Y N
2022-10-27 10:52:22.423975: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/device:GPU:0 with 22336 MB memory) -> physical GPU (device: 0, name: NVIDIA RTX A5000, pci bus id: 0000:73:00.0, compute capability: 8.6)
2022-10-27 10:52:22.424730: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/device:GPU:1 with 546 MB memory) -> physical GPU (device: 1, name: NVIDIA RTX A5000, pci bus id: 0000:a6:00.0, compute capability: 8.6)
True
>>>
拉取官方docker镜像(Python3.6),环境配置为:
ENV CUDA_VERSION=10.0.130
ARCH= CUDA=10.0 CUDNN=7.4.1.5-1 /bin/bash
在两个环境下跑代码,都一直会报同样的错误:
2022-10-26 09:25:36.580487: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2022-10-26 09:27:01.922083: E tensorflow/stream_executor/cuda/cuda_blas.cc:698] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
File "train_model.py", line 3, in
train()
File "/share/speech_seg/BiLSTM/bilstm_speech_seg_train.py", line 86, in train_bilstm
validation_data=(valid_x, valid_y), shuffle=True)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1039, in fit
validation_steps=validation_steps)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_arrays.py", line 199, in fit_loop
outs = f(ins_batch)
File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
return self._call(inputs)
File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1439, in __call__
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(800, 256), b.shape=(256, 32), m=800, n=32, k=256
[[{{node time_distributed_1/MatMul}}]]
[[{{node metrics/acc/Mean_1}}]]
可以看到虽然能够成功打开CUDA本地库
2022-10-26 09:25:36.580487: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
但是一直有同样的报错提示:
2022-10-26 09:27:01.922083: E tensorflow/stream_executor/cuda/cuda_blas.cc:698] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED
解决方法
参考Github Issues的解决方法:
https://github.com/qqwweee/keras-yolo3/issues/332
首先可以尝试一下将Tensorflow的版本升级到1.14.0,可以解决部分的问题
pip install tensorflow-gpu==1.14.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
如果升级之后还是报错:
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas GEMM launch failed : a.shape=(200, 256), b.shape=(256, 32), m=200, n=32, k=256
[[{{node time_distributed_1/MatMul}}]]
[[metrics/acc/Mean_1/_205]]
(1) Internal: Blas GEMM launch failed : a.shape=(200, 256), b.shape=(256, 32), m=200, n=32, k=256
[[{{node time_distributed_1/MatMul}}]]
0 successful operations.
0 derived errors ignored.
那就是你的电脑显卡版本是RTX30系列(包括RTX A5000)不再支持CUDA 9以及CUDA 10,GeForce RTX 30系显卡目前是支持CUDA 11.1及以上版本
所以只能重新编译一个CUDA 11版本下的tensorflow-gpu==1.15.0放到自己拉取的镜像中。