【随记】TensorFlow 2.2 GPU错误:Could not load dynamic library ‘libcudart.so.10.1

同事把CUDA从10.1升级到10.2了,原本以为更新一下TensorFlow就行,结果遇到一些问题记录

环境

CUDA:10.2
tensorflow:2.2.0
python:3.6

版本符合官网的要求

执行:

import tensorflow as tf
tf.test.is_gpu_available()

放回False

报错内容

W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH:/usr/local/cuda-10.1/lib64

这也就导致:

W tensorflow/core/common_runtime/gpu/gpu_device.cc:1598] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

所以是少了'libcudart.so.10.1这个库,导致启用GPU失败

解决方法

可以看到报错是'libcudart.so.10.1/usr/local/cuda-10.1/lib64中找不到,已经安装了CUDA10.2所以修改.bashrc文件换成/usr/local/cuda-10.2/lib64,记得source ~/.bashrc
依旧报错:

W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH:/usr/local/cuda-10.2/lib64

库应该载入的'libcudart.so.10.2,并且在/usr/local/cuda-10.2/lib64目录下也发现了它,参照TensorFlow issue,原来是没有Tensorflow 2.2.0某种原图没有链接到'libcudart.so.10.2

解决方法一:

Build Tensorflow,但是太麻烦

解决方法二:

建立软链接:

sudo ln -s /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2  /usr/lib/x86_64-linux-gnu/libcudart.so.10.1

因为'libcudart.so.10.2兼容libcudart.so.10.1,所以用10.2替代10.1。
如果没有root权限,那就建一个目录
比如:

mkdir ~/lib64

然后建立链接到用户目录下:

sudo ln -s /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2  ~/lib64/libcudart.so.10.1

别忘了修改.bashrc文件

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda-10.2/lib64:/usr/local/cuda/lib64:~/lib64

让文件生效source ./.bashrc
不出意外,执行tf.test.is_gpu_available()应该是返回True了

解决方法三

使用cudatoolkit,可以在虚拟环境中使用和系统CUDA版本不一样的CUDA,可以参考这篇教程

参考

tensorflow-gpu: Could not load dynamic library ‘libcudart.so.10.1’
Pytorch 使用不同版本的 cuda

你可能感兴趣的:(深度学习)