tensorflow-gpu与cuda版本兼容问题

遇到问题:

tensorflow-gpu与cuda的版本不对应,出现如下找不到cuda库的错误。

>>> import tensorflow as tf
2020-08-13 10:18:40.071778: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_100.dll'; dlerror: cudart64_100.dll
 not found
2020-08-13 10:18:40.078105: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
>>> print(tf.test.is_gpu_available())
2020-08-13 10:21:08.259454: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX
2
2020-08-13 10:21:08.268433: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-08-13 10:21:09.368477: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 2060 major: 7 minor: 5 memoryClockRate(GHz): 1.335
pciBusID: 0000:01:00.0
2020-08-13 10:21:09.378233: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_100.dll'; dlerror: cudart64_100.dll
 not found
2020-08-13 10:21:09.388143: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cublas64_100.dll'; dlerror: cublas64_100.dll
 not found
2020-08-13 10:21:09.396653: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cufft64_100.dll'; dlerror: cufft64_100.dll n
ot found
2020-08-13 10:21:09.405474: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'curand64_100.dll'; dlerror: curand64_100.dll
 not found
2020-08-13 10:21:09.413780: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cusolver64_100.dll'; dlerror: cusolver64_100
.dll not found
2020-08-13 10:21:09.422508: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cusparse64_100.dll'; dlerror: cusparse64_100
.dll not found
2020-08-13 10:21:09.438972: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-08-13 10:21:09.442562: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned
above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required librari
es for your platform.
Skipping registering GPU devices...
2020-08-13 10:21:09.577059: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-13 10:21:09.581533: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2020-08-13 10:21:09.584449: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
False
 

问题描述:

在Windows 10系统上已经安装了python3.7,cuda10.1和tensorflow2.1,版本是对应的,能够正常使用。

现需要运行别人的代码,别人的代码使用tensorflow1.xx版本写的,而tensorflow1.xx在升级到tensorflow2.xx的过程中,有的函数被弃用了,如contrib函数:

AttributeError: module 'tensorflow' has no attribute 'contrib'

因此需要对tensorflow进行降级

降级tensorflow版本对应关系:

cuda和tensorflow的版本对应关系(从源代码构建  |  TensorFlow)

tensorflow-gpu与cuda版本兼容问题_第1张图片

tensorflow1.13及以上版本支持python3.7,所以可以安装tensorflow1.14.0。

tensorflow1.xx都不支持cuda10.1,但是重装cuda比较麻烦,并且重装cuda后,与之关联的pytorch等都得重新安装。

解决方法:

直接将调用cuda10.1库改成cuda10.0即可,使用暂时没有发现问题。

进入cuda安装路径C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin,把如下文件都复制改名xxx_100。

(注:提示找不到xxx.dll,就去这个路径下面找到复制一份,再改成对应的名字)

tensorflow-gpu与cuda版本兼容问题_第2张图片

可能还会遇到的问题 :

F .\tensorflow/core/kernels/conv_2d_gpu.h:447] Non-OK-status: GpuLaunchKernel(ShuffleInTensor3Simple, config.block_count, config
.thread_per_block, 0, d.stream(), config.virtual_thread_count, in.data(), combined_dims, out.data()) status: Internal: invalid device function

错误原因:由于cuda版本的切换,系统没法自动定位GPU,人为指定GPU编号即可。

代码如下:

# 导入os包
import os
# 指定GPU
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

2021.01.20更新,

os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

这行代码的意思是尽管安装的是tensorflow-gpu但是,只使用cpu模式来运行。

所以最优的解决方法还是安装多个版本的CUDA来支持多版本的tensorflow。最新解决方法参考:Windows 10系统下安装多版本CUDA和cuDNN(及多版本的tensorflow和pytorch)_博博有个大大大的Dream-CSDN博客

你可能感兴趣的:(环境配置,tensorflow,深度学习,python)