ubuntu16.04+nvidia440+cuda10.2+cudnn 7.6.5+tf2.1(踩坑)

系统信息:

  1. ubantu 16.0.4,
  2. nvidia1080ti卡,
  3. 当前已经安装的cuda 版本(cuda 9.0+cudnn7.0.5;cuda 8.0+cudnn5.1.10)
  4. 当前nvidia driver:384.90
  5. tensorflow-gpu : 2.1.0

安装:

完全按照机器之心这篇文章,唯一需要修改的地方,就是注意nvidia驱动的版本,cuda的版本,以及与cuda 对应的cudnn 版本号。

我安装的nvidia440+cuda10.2+cudnn 7.6.5+tf-gpu 2.1 出现了一下三个问题。

1号坑(有一种来看秦始皇陵兵马俑的感觉,):

2020-03-06 19:54:56.304782: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/local/cuda/lib64
2020-03-06 19:54:56.304857: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/local/cuda/lib64
2020-03-06 19:54:56.304869: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

· 上面的提示我没有理睬,这个是没有安装TensorRT的原因


2号坑

Could not load dynamic library 'libcudart.so.10.1'; dlerror:libcudart.so.10.1 cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/local/cuda/lib64

· 这个浪费我很久,因为一旦不解决,使用tf.config.list_physical_devices('GPU') 是不返回任何gpu ,说明程序看不到GPU的。在stackOverflow 上查找,很多人说cuda 10.1不支持tf2.1 ,说需要更换成cuda 10.0,我没有这样做,因为这样做,前面的显卡驱动,cuda 有需要重新装,太麻烦。我的解决办法是在/usr/local/cuda/lib64(这个路径存在 libcudart.so.10.12)中设置了一个软链接 ,使其只想libcudart.so.10.2,完美解决。

ln -s libcudart.so.10.1  libcudart.so.10.2

3号坑

Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/local/cuda/lib64

· 这个有了2号坑的经验,就简单的多了,在cuda-9.0的lib64中找到了 libcudnn.so.7,然后复制到cuda-10.0的lib64中,


测试

print(tf.config.list_physical_devices('GPU'))

输出结果:完美。

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:3', device_type='GPU')]

编辑于:2020年03月06日20:16:15

你可能感兴趣的:(ubuntu16.04+nvidia440+cuda10.2+cudnn 7.6.5+tf2.1(踩坑))