解决Could not load dynamic library 'libcudart.so.10.0'的问题

问题表现与分析

在安装了CUDA和CUDNN还有Tensorflow最新的2.0正式版本后,我在使用Pycharm写TF代码并运行时,遇到这样的问题

主要表现就是提示找不到动态库文件,扫了一眼文件名,都是CUDA的库文件,那怎么会说找不到

2019-10-15 19:19:41.440285: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2019-10-15 19:19:41.465433: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-15 19:19:41.465758: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.665
pciBusID: 0000:01:00.0
2019-10-15 19:19:41.465809: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2019-10-15 19:19:41.465841: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
2019-10-15 19:19:41.465870: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
2019-10-15 19:19:41.465900: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
2019-10-15 19:19:41.465930: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
2019-10-15 19:19:41.465959: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
2019-10-15 19:19:41.468179: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-10-15 19:19:41.468189: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2019-10-15 19:19:41.468361: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-15 19:19:41.490938: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3696000000 Hz
2019-10-15 19:19:41.492057: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x520eba0 executing computations on platform Host. Devices:
2019-10-15 19:19:41.492085: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
2019-10-15 19:19:41.559665: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-15 19:19:41.560029: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5241a20 executing computations on platform CUDA. Devices:
2019-10-15 19:19:41.560040: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
2019-10-15 19:19:41.560084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-15 19:19:41.560088: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      
2019-10-15 19:19:41.562457: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-15 19:19:41.562855: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.665
pciBusID: 0000:01:00.0
2019-10-15 19:19:41.562913: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2019-10-15 19:19:41.562945: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
2019-10-15 19:19:41.562975: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
2019-10-15 19:19:41.563004: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
2019-10-15 19:19:41.563032: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
2019-10-15 19:19:41.563062: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
2019-10-15 19:19:41.563069: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-10-15 19:19:41.563073: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2019-10-15 19:19:41.563080: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-15 19:19:41.563083: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2019-10-15 19:19:41.563086: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2019-10-15 19:19:41.563504: I tensorflow/core/common_runtime/direct_session.cc:359] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device

本来以为是因为我的环境变量没有配置清楚,但是试着测试了一下CUDA和CUDNN都在该在的位置,在终端上也能正常的运行

 

后面发现这个问题的出现是由于目前版本的Tensorflow还只能支持CUDA10.0,而英伟达的CUDA则是更新到了10.1,要解决这个问题,其实可以通过两个版本切换的方式来达到,要用哪个切换哪个

 

解决流程

 

下载CUDA10.0,我电脑上面已经配置了10.1版本了

wget https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux

添加可执行权限

sudo chmod +x cuda_10.0.130_410.48_linux

以sudo执行这个安装程序,记得这样选,只安装CUDA10.0但是不要安装驱动程序,选项可以参考我这边的截图

解决Could not load dynamic library 'libcudart.so.10.0'的问题_第1张图片

解决Could not load dynamic library 'libcudart.so.10.0'的问题_第2张图片

然后就会开始安装了,安装完成之后,sudo vim ~/.bashrc 修改环境变量,修改前如下

修改后,把这部分我们先前配置的删除,替换成如下,保存文件

export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:"$LD_LIBRARY_PATH:/usr/loacl/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
export PATH=/usr/local/cuda/bin:$PATH

使用 source ~/.bashrc 来应用环境配置,如果一切就绪,应该使用nvcc --version显示的就是CUDA 10.0版本了

arenascat@TensorSystem:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

如果Pycharm那边再运行python还是没成功,就先把cuda目录文件删除,这个文件其实是一个软链接,之后,再重建一个链接就好了

sudo rm -rf /usr/local/cuda

sudo ln -s /usr/local/cuda-10.0/ /usr/local/cuda

再次运行调用了tensorflow-gpu的代码,可以看到已经成功的读取了库

如果要设置为CUDA 10.1的话,把上面的10.0改一改就行了

 

如果这个问题不能解决的话,再试试多设置一下环境变量,新建一个文件

sudo vi /etc/profile.d/cuda.sh

把以下内容复制进去并保存

export PATH=$PATH:/usr/local/cuda/bin
export CUDADIR=/usr/local/cuda

 这一个位置也要新建一个文件,如果你装了10.1可能里面会已经有一个cuda10.1.conf不用管它

sudo vi /etc/ld.so.conf.d/cuda.conf

文件里面有这一句

/usr/local/cuda/lib64

 应用设置

sudo ldconfig

 

你可能感兴趣的:(日常问题,AI,CUDA,Tensorflow)