解决Tensorflow error(CUBLAS_STATUS_EXECUTION_FAILED)报错问题

问题描述

系统:Ubuntu20.04

显卡:RTX A5000

物理机CUDA Verion:11.4

所安装的Tensorflow版本:tensorflow-gpu==1.13.1

采用Conda创建了虚拟环境以及拉取Tensorflow官方docker镜像两种方式,

其中Conda虚拟环境配置(Python3.7.2),使用conda命令安装,如下所示:

cudatoolkit               10.0.130            h8c5a6a4_10    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
cudnn                     7.6.5.32             ha8d7eb6_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge

事实证明tensorflow-gpu==1.13.1是可以在该虚拟环境下运行成功:

>>> tf.test.is_gpu_available()
2022-10-27 10:52:22.010425: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2022-10-27 10:52:22.059681: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3400000000 Hz
2022-10-27 10:52:22.063477: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55c559aae5b0 executing computations on platform Host. Devices:
2022-10-27 10:52:22.063507: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): , 
2022-10-27 10:52:22.066212: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2022-10-27 10:52:22.399495: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55c55ba1eed0 executing computations on platform CUDA. Devices:
2022-10-27 10:52:22.399552: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): NVIDIA RTX A5000, Compute Capability 8.6
2022-10-27 10:52:22.399570: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (1): NVIDIA RTX A5000, Compute Capability 8.6
2022-10-27 10:52:22.402277: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: NVIDIA RTX A5000 major: 8 minor: 6 memoryClockRate(GHz): 1.695
pciBusID: 0000:73:00.0
2022-10-27 10:52:22.402818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 1 with properties: 
name: NVIDIA RTX A5000 major: 8 minor: 6 memoryClockRate(GHz): 1.695
pciBusID: 0000:a6:00.0
2022-10-27 10:52:22.403181: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2022-10-27 10:52:22.405501: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2022-10-27 10:52:22.407398: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2022-10-27 10:52:22.407895: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2022-10-27 10:52:22.410588: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2022-10-27 10:52:22.412655: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2022-10-27 10:52:22.417214: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2022-10-27 10:52:22.420181: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0, 1
2022-10-27 10:52:22.420233: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2022-10-27 10:52:22.421823: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-10-27 10:52:22.421843: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 1 
2022-10-27 10:52:22.421851: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N Y 
2022-10-27 10:52:22.421857: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 1:   Y N 
2022-10-27 10:52:22.423975: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/device:GPU:0 with 22336 MB memory) -> physical GPU (device: 0, name: NVIDIA RTX A5000, pci bus id: 0000:73:00.0, compute capability: 8.6)
2022-10-27 10:52:22.424730: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/device:GPU:1 with 546 MB memory) -> physical GPU (device: 1, name: NVIDIA RTX A5000, pci bus id: 0000:a6:00.0, compute capability: 8.6)
True
>>> 

拉取官方docker镜像(Python3.6),环境配置为:

ENV CUDA_VERSION=10.0.130
ARCH= CUDA=10.0 CUDNN=7.4.1.5-1 /bin/bash

在两个环境下跑代码,都一直会报同样的错误:

2022-10-26 09:25:36.580487: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2022-10-26 09:27:01.922083: E tensorflow/stream_executor/cuda/cuda_blas.cc:698] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
  File "train_model.py", line 3, in 
    train()
  File "/share/speech_seg/BiLSTM/bilstm_speech_seg_train.py", line 86, in train_bilstm
    validation_data=(valid_x, valid_y), shuffle=True)
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1039, in fit
    validation_steps=validation_steps)
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_arrays.py", line 199, in fit_loop
    outs = f(ins_batch)
  File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
    return self._call(inputs)
  File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
    fetched = self._callable_fn(*array_vals)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1439, in __call__
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(800, 256), b.shape=(256, 32), m=800, n=32, k=256
	 [[{{node time_distributed_1/MatMul}}]]
	 [[{{node metrics/acc/Mean_1}}]]

可以看到虽然能够成功打开CUDA本地库

2022-10-26 09:25:36.580487: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally

但是一直有同样的报错提示:

2022-10-26 09:27:01.922083: E tensorflow/stream_executor/cuda/cuda_blas.cc:698] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED

解决方法

参考Github Issues的解决方法:

https://github.com/qqwweee/keras-yolo3/issues/332

首先可以尝试一下将Tensorflow的版本升级到1.14.0,可以解决部分的问题

pip install tensorflow-gpu==1.14.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

如果升级之后还是报错:

tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Blas GEMM launch failed : a.shape=(200, 256), b.shape=(256, 32), m=200, n=32, k=256
	 [[{{node time_distributed_1/MatMul}}]]
	 [[metrics/acc/Mean_1/_205]]
  (1) Internal: Blas GEMM launch failed : a.shape=(200, 256), b.shape=(256, 32), m=200, n=32, k=256
	 [[{{node time_distributed_1/MatMul}}]]
0 successful operations.
0 derived errors ignored.

那就是你的电脑显卡版本是RTX30系列(包括RTX A5000)不再支持CUDA 9以及CUDA 10,GeForce RTX 30系显卡目前是支持CUDA 11.1及以上版本

所以只能重新编译一个CUDA 11版本下的tensorflow-gpu==1.15.0放到自己拉取的镜像中。

你可能感兴趣的:(Python,debug,踩坑记录,tensorflow,python,CUDA,RTX显卡)