查看Linux服务器操作系统的信息
lxb@root1-PowerEdge-R740:~$ cat /etc/issue
Ubuntu 18.04.5 LTS \n \l
查看内核版本
lxb@root1-PowerEdge-R740:~$ cat /proc/version
Linux version 5.4.0-73-generic (buildd@lgw01-amd64-038) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #82~18.04.1-Ubuntu SMP Fri Apr 16 15:10:02 UTC 2021
查看cpu的型号
lxb@root1-PowerEdge-R740:~$ cat /proc/cpuinfo | grep name | cut -f2 -d: | uniq -c
40 Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz4.
查看内存信息
lxb@root1-PowerEdge-R740:~$ free -m
total used free shared buff/cache available
Mem: 128419 9938 97429 43 21051 117400
Swap: 2047 0 2047
查看显卡信息
CUDA Version: 11.2
lxb@root1-PowerEdge-R740:~$ nvidia-smi
Wed Jun 9 16:35:04 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro RTX 6000 Off | 00000000:3B:00.0 Off | 0 |
| N/A 66C P0 208W / 250W | 14177MiB / 22698MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3758 C /home/wzw/train 4541MiB |
| 0 N/A N/A 34026 C python 9633MiB |
+-----------------------------------------------------------------------------+
lxb@root1-PowerEdge-R740:~$
到该页面下载, 选择Linux版本 https://www.anaconda.com/products/individual#Downloads 。
将下载的安装包上传到服务器
lxb@root1-PowerEdge-R740:~$ chmod 775 Anaconda3-2021.05-Linux-x86_64.sh
lxb@root1-PowerEdge-R740:~$ bash Anaconda3-2021.05-Linux-x86_64.sh Anaconda3-2021.05-Linux-x86_64.sh
Anaconda reserves all rights not expressly granted to you in this Agreement.
Do you accept the license terms? [yes|no] //输入yes,同意license
[no] >>> yes
Anaconda3 will now be installed into this location:
/home/lxb/anaconda3 // 即将安装的位置
- Press ENTER to confirm the location
- Press CTRL-C to abort the installation
- Or specify a different location below
...
// 按enter开始安装
lxb@root1-PowerEdge-R740:~$ conda
conda: command not found
lxb@root1-PowerEdge-R740:~$ source .bashrc // 使其重新生效
(base) lxb@root1-PowerEdge-R740:~$ conda
usage: conda [-h] [-V] command ...
lxb@root1-PowerEdge-R740:~$ conda create -n pytorch_1.8.0_py_3.8 python=3.8
(base) lxb@root1-PowerEdge-R740:~$ conda activate pytorch_1.8.0_py_3.8
至于版本对应关系,自行到pytorch官网查询。
(pytorch_1.8.0_py_3.8) lxb@root1-PowerEdge-R740:~$ conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
(pytorch_1.8.0_py_3.8) lxb@root1-PowerEdge-R740:~$ python
Python 3.8.10 (default, May 19 2021, 18:05:58)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True # gpu可用
参考官网地址:
Linux端 :https://tensorflow.google.cn/install/source#linux
Windows端:https://tensorflow.google.cn/install/source_windows
(base) lxb@root1-PowerEdge-R740:~$ conda create -n tensorflow_2.5.0_py_3.8 python=3.8
// 创建新的环境用于tensorflow
// 环境创建成功后,切换到tensorflow环境中
(base) lxb@root1-PowerEdge-R740:~$ conda activate tensorflow_2.5.0_py_3.8
// 安装tensorflow
(tensorflow_2.5.0_py_3.8) lxb@root1-PowerEdge-R740:~$ pip install tensorflow==2.5.0
(tensorflow_2.5.0_py_3.8) lxb@root1-PowerEdge-R740:~$ python
Python 3.8.10 (default, Jun 4 2021, 15:09:15)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2021-06-14 15:11:26.560061: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-06-14 15:11:26.560107: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
// 上面报错了 Could not load dynamic library 'libcudart.so.11.0',找不到这个动态运行库
解决办法:
如下先查看cuda的该动态运行库是否存在
(tensorflow_2.5.0_py_3.8) lxb@root1-PowerEdge-R740:/usr$ cd ~
(tensorflow_2.5.0_py_3.8) lxb@root1-PowerEdge-R740:~$ ls
MNIST_ReLu Pytorch-UNet-master anaconda3 cuda cudnn-11.3-linux-x64-v8.2.0.53.tgz examples.desktop
(tensorflow_2.5.0_py_3.8) lxb@root1-PowerEdge-R740:~$ cd /usr/local/
bin/ cuda-10.1/ etc/ include/ man/ share/
cuda/ cuda-11.1/ games/ lib/ sbin/ src/
(tensorflow_2.5.0_py_3.8) lxb@root1-PowerEdge-R740:~$ cd /usr/local/cuda
(tensorflow_2.5.0_py_3.8) lxb@root1-PowerEdge-R740:/usr/local/cuda$ ls
DOCS bin include nsight-compute-2020.2.0 nvml share tools
EULA.txt compute-sanitizer lib64 nsight-systems-2020.3.4 nvvm src
README extras libnvvp nsightee_plugins samples targets
(tensorflow_2.5.0_py_3.8) lxb@root1-PowerEdge-R740:/usr/local/cuda$ cd lib64/
(tensorflow_2.5.0_py_3.8) lxb@root1-PowerEdge-R740:/usr/local/cuda/lib64$ ll
total 3746652
drwxr-xr-x 3 root root 4096 Apr 28 22:02 ./
drwxr-xr-x 4 root root 4096 Apr 28 22:01 ../
lrwxrwxrwx 1 root root 14 Apr 28 22:01 libOpenCL.so -> libOpenCL.so.1*
lrwxrwxrwx 1 root root 16 Apr 28 22:01 libOpenCL.so.1 -> libOpenCL.so.1.0*
lrwxrwxrwx 1 root root 18 Apr 28 22:01 libOpenCL.so.1.0 -> libOpenCL.so.1.0.0*
-rwxr-xr-x 1 root root 27288 Apr 28 22:01 libOpenCL.so.1.0.0*
lrwxrwxrwx 1 root root 19 Apr 28 22:01 libaccinj64.so -> libaccinj64.so.11.1*
lrwxrwxrwx 1 root root 22 Apr 28 22:01 libaccinj64.so.11.1 -> libaccinj64.so.11.1.69*
-rwxr-xr-x 1 root root 2058016 Apr 28 22:01 libaccinj64.so.11.1.69*
lrwxrwxrwx 1 root root 15 Apr 28 22:02 libcublas.so -> libcublas.so.11*
lrwxrwxrwx 1 root root 22 Apr 28 22:01 libcublas.so.11 -> libcublas.so.11.2.1.74*
-rwxr-xr-x 1 root root 120427896 Apr 28 22:01 libcublas.so.11.2.1.74*
lrwxrwxrwx 1 root root 17 Apr 28 22:02 libcublasLt.so -> libcublasLt.so.11*
lrwxrwxrwx 1 root root 24 Apr 28 22:01 libcublasLt.so.11 -> libcublasLt.so.11.2.1.74*
-rwxr-xr-x 1 root root 222330208 Apr 28 22:01 libcublasLt.so.11.2.1.74*
-rw-r--r-- 1 root root 280851862 Apr 28 22:02 libcublasLt_static.a
-rw-r--r-- 1 root root 137961316 Apr 28 22:02 libcublas_static.a
-rw-r--r-- 1 root root 1041380 Apr 28 22:01 libcudadevrt.a
lrwxrwxrwx 1 root root 17 Apr 28 22:01 libcudart.so -> libcudart.so.11.0*
lrwxrwxrwx 1 root root 20 Apr 28 22:01 libcudart.so.11.0 -> libcudart.so.11.1.74*
-rwxr-xr-x 1 root root 540184 Apr 28 22:01 libcudart.so.11.1.74*
-rw-r--r-- 1 root root 985566 Apr 28 22:01 libcudart_static.a
lrwxrwxrwx 1 root root 14 Apr 28 22:02 libcufft.so -> libcufft.so.10*
lrwxrwxrwx 1 root root 21 Apr 28 22:01 libcufft.so.10 -> libcufft.so.10.3.0.74*
-rwxr-xr-x 1 root root 235963680 Apr 28 22:01 libcufft.so.10.3.0.74*
-rw-r--r-- 1 root root 187398366 Apr 28 22:02 libcufft_static.a
-rw-r--r-- 1 root root 246083588 Apr 28 22:02 libcufft_static_nocallback.a
lrwxrwxrwx 1 root root 15 Apr 28 22:02 libcufftw.so -> libcufftw.so.10*
lrwxrwxrwx 1 root root 22 Apr 28 22:01 libcufftw.so.10 -> libcufftw.so.10.3.0.74*
-rwxr-xr-x 1 root root 556976 Apr 28 22:01 libcufftw.so.10.3.0.74*
-rw-r--r-- 1 root root 31394 Apr 28 22:02 libcufftw_static.a
lrwxrwxrwx 1 root root 18 Apr 28 22:01 libcuinj64.so -> libcuinj64.so.11.1*
lrwxrwxrwx 1 root root 21 Apr 28 22:01 libcuinj64.so.11.1 -> libcuinj64.so.11.1.69*
-rwxr-xr-x 1 root root 2478816 Apr 28 22:01 libcuinj64.so.11.1.69*
-rw-r--r-- 1 root root 31482 Apr 28 22:01 libculibos.a
lrwxrwxrwx 1 root root 15 Apr 28 22:02 libcurand.so -> libcurand.so.10*
lrwxrwxrwx 1 root root 22 Apr 28 22:01 libcurand.so.10 -> libcurand.so.10.2.2.74*
-rwxr-xr-x 1 root root 75343512 Apr 28 22:01 libcurand.so.10.2.2.74*
-rw-r--r-- 1 root root 75487596 Apr 28 22:02 libcurand_static.a
lrwxrwxrwx 1 root root 17 Apr 28 22:02 libcusolver.so -> libcusolver.so.11*
lrwxrwxrwx 1 root root 24 Apr 28 22:01 libcusolver.so.11 -> libcusolver.so.11.0.0.74*
-rwxr-xr-x 1 root root 696126952 Apr 28 22:01 libcusolver.so.11.0.0.74*
lrwxrwxrwx 1 root root 19 Apr 28 22:02 libcusolverMg.so -> libcusolverMg.so.11*
lrwxrwxrwx 1 root root 26 Apr 28 22:01 libcusolverMg.so.11 -> libcusolverMg.so.11.0.0.74*
-rwxr-xr-x 1 root root 400977624 Apr 28 22:01 libcusolverMg.so.11.0.0.74*
-rw-r--r-- 1 root root 195664684 Apr 28 22:02 libcusolver_static.a
lrwxrwxrwx 1 root root 17 Apr 28 22:02 libcusparse.so -> libcusparse.so.11*
lrwxrwxrwx 1 root root 25 Apr 28 22:02 libcusparse.so.11 -> libcusparse.so.11.2.0.275*
-rwxr-xr-x 1 root root 233295448 Apr 28 22:01 libcusparse.so.11.2.0.275*
-rw-r--r-- 1 root root 239529204 Apr 28 22:02 libcusparse_static.a
-rw-r--r-- 1 root root 9782944 Apr 28 22:02 liblapack_static.a
-rw-r--r-- 1 root root 1025290 Apr 28 22:02 libmetis_static.a
lrwxrwxrwx 1 root root 13 Apr 28 22:02 libnppc.so -> libnppc.so.11*
lrwxrwxrwx 1 root root 21 Apr 28 22:01 libnppc.so.11 -> libnppc.so.11.1.1.269*
-rwxr-xr-x 1 root root 552880 Apr 28 22:01 libnppc.so.11.1.1.269*
-rw-r--r-- 1 root root 31534 Apr 28 22:02 libnppc_static.a
lrwxrwxrwx 1 root root 15 Apr 28 22:02 libnppial.so -> libnppial.so.11*
lrwxrwxrwx 1 root root 23 Apr 28 22:01 libnppial.so.11 -> libnppial.so.11.1.1.269*
-rwxr-xr-x 1 root root 14242136 Apr 28 22:01 libnppial.so.11.1.1.269*
-rw-r--r-- 1 root root 16828126 Apr 28 22:02 libnppial_static.a
lrwxrwxrwx 1 root root 15 Apr 28 22:02 libnppicc.so -> libnppicc.so.11*
lrwxrwxrwx 1 root root 23 Apr 28 22:01 libnppicc.so.11 -> libnppicc.so.11.1.1.269*
-rwxr-xr-x 1 root root 6201720 Apr 28 22:01 libnppicc.so.11.1.1.269*
-rw-r--r-- 1 root root 6933720 Apr 28 22:02 libnppicc_static.a
lrwxrwxrwx 1 root root 16 Apr 28 22:02 libnppidei.so -> libnppidei.so.11*
lrwxrwxrwx 1 root root 24 Apr 28 22:01 libnppidei.so.11 -> libnppidei.so.11.1.1.269*
-rwxr-xr-x 1 root root 9974688 Apr 28 22:01 libnppidei.so.11.1.1.269*
-rw-r--r-- 1 root root 11626968 Apr 28 22:02 libnppidei_static.a
lrwxrwxrwx 1 root root 14 Apr 28 22:02 libnppif.so -> libnppif.so.11*
lrwxrwxrwx 1 root root 22 Apr 28 22:01 libnppif.so.11 -> libnppif.so.11.1.1.269*
-rwxr-xr-x 1 root root 73001192 Apr 28 22:01 libnppif.so.11.1.1.269*
-rw-r--r-- 1 root root 76707190 Apr 28 22:02 libnppif_static.a
lrwxrwxrwx 1 root root 14 Apr 28 22:02 libnppig.so -> libnppig.so.11*
lrwxrwxrwx 1 root root 22 Apr 28 22:01 libnppig.so.11 -> libnppig.so.11.1.1.269*
-rwxr-xr-x 1 root root 38253048 Apr 28 22:01 libnppig.so.11.1.1.269*
-rw-r--r-- 1 root root 40104978 Apr 28 22:02 libnppig_static.a
lrwxrwxrwx 1 root root 14 Apr 28 22:02 libnppim.so -> libnppim.so.11*
lrwxrwxrwx 1 root root 22 Apr 28 22:01 libnppim.so.11 -> libnppim.so.11.1.1.269*
-rwxr-xr-x 1 root root 9130352 Apr 28 22:01 libnppim.so.11.1.1.269*
-rw-r--r-- 1 root root 9045214 Apr 28 22:02 libnppim_static.a
lrwxrwxrwx 1 root root 15 Apr 28 22:02 libnppist.so -> libnppist.so.11*
lrwxrwxrwx 1 root root 23 Apr 28 22:01 libnppist.so.11 -> libnppist.so.11.1.1.269*
-rwxr-xr-x 1 root root 27759480 Apr 28 22:01 libnppist.so.11.1.1.269*
-rw-r--r-- 1 root root 29491992 Apr 28 22:02 libnppist_static.a
lrwxrwxrwx 1 root root 15 Apr 28 22:02 libnppisu.so -> libnppisu.so.11*
lrwxrwxrwx 1 root root 23 Apr 28 22:01 libnppisu.so.11 -> libnppisu.so.11.1.1.269*
-rwxr-xr-x 1 root root 540352 Apr 28 22:01 libnppisu.so.11.1.1.269*
-rw-r--r-- 1 root root 10642 Apr 28 22:02 libnppisu_static.a
lrwxrwxrwx 1 root root 15 Apr 28 22:02 libnppitc.so -> libnppitc.so.11*
lrwxrwxrwx 1 root root 23 Apr 28 22:01 libnppitc.so.11 -> libnppitc.so.11.1.1.269*
-rwxr-xr-x 1 root root 3985432 Apr 28 22:01 libnppitc.so.11.1.1.269*
-rw-r--r-- 1 root root 3967140 Apr 28 22:02 libnppitc_static.a
lrwxrwxrwx 1 root root 13 Apr 28 22:02 libnpps.so -> libnpps.so.11*
lrwxrwxrwx 1 root root 21 Apr 28 22:01 libnpps.so.11 -> libnpps.so.11.1.1.269*
-rwxr-xr-x 1 root root 11998112 Apr 28 22:01 libnpps.so.11.1.1.269*
-rw-r--r-- 1 root root 12834198 Apr 28 22:02 libnpps_static.a
lrwxrwxrwx 1 root root 18 Apr 28 22:01 libnvToolsExt.so -> libnvToolsExt.so.1*
lrwxrwxrwx 1 root root 22 Apr 28 22:01 libnvToolsExt.so.1 -> libnvToolsExt.so.1.0.0*
-rwxr-xr-x 1 root root 36832 Apr 28 22:01 libnvToolsExt.so.1.0.0*
lrwxrwxrwx 1 root root 15 Apr 28 22:02 libnvblas.so -> libnvblas.so.11*
lrwxrwxrwx 1 root root 22 Apr 28 22:01 libnvblas.so.11 -> libnvblas.so.11.2.1.74*
-rwxr-xr-x 1 root root 589952 Apr 28 22:01 libnvblas.so.11.2.1.74*
lrwxrwxrwx 1 root root 15 Apr 28 22:02 libnvjpeg.so -> libnvjpeg.so.11*
lrwxrwxrwx 1 root root 22 Apr 28 22:01 libnvjpeg.so.11 -> libnvjpeg.so.11.2.0.74*
-rwxr-xr-x 1 root root 4503832 Apr 28 22:01 libnvjpeg.so.11.2.0.74*
-rw-r--r-- 1 root root 5733336 Apr 28 22:02 libnvjpeg_static.a
-rw-r--r-- 1 root root 18098784 Apr 28 22:01 libnvptxcompiler_static.a
lrwxrwxrwx 1 root root 25 Apr 28 22:01 libnvrtc-builtins.so -> libnvrtc-builtins.so.11.1*
lrwxrwxrwx 1 root root 28 Apr 28 22:01 libnvrtc-builtins.so.11.1 -> libnvrtc-builtins.so.11.1.74*
-rwxr-xr-x 1 root root 5645576 Apr 28 22:01 libnvrtc-builtins.so.11.1.74*
lrwxrwxrwx 1 root root 16 Apr 28 22:01 libnvrtc.so -> libnvrtc.so.11.1*
lrwxrwxrwx 1 root root 19 Apr 28 22:01 libnvrtc.so.11.1 -> libnvrtc.so.11.1.74*
-rwxr-xr-x 1 root root 32575992 Apr 28 22:01 libnvrtc.so.11.1.74*
drwxr-xr-x 2 root root 4096 Apr 28 22:02 stubs/
也可以如下:查看得知,该库是存在的。
(tensorflow_2.5.0_py_3.8) lxb@root1-PowerEdge-R740:/usr/local/cuda/lib64$ ls | grep libcudart.so.11.0
libcudart.so.11.0
怀疑是改环境变量没配。
配置环境变量:(LD_LIBRARY_PATH: 动态库的查找路径)
当执行函数动态链接.so时,如果此文件不在缺省目录下‘/lib’ and ‘/usr/lib’. 那么就需要指定环境变量LD_LIBRARY_PATH
假如需要在已有的环境变量上添加新的路径名,则采用如下方式:使用 :
来链接多个路径
百度百科LD_LIBRARY_PATH
export LD_LIBRARY_PATH=LD_LIBRARY_PATH:/usr/local/cuda/lib64
再次测试:
(tensorflow_2.5.0_py_3.8) lxb@root1-PowerEdge-R740:/usr/local/cuda/lib64$ cd ~
(tensorflow_2.5.0_py_3.8) lxb@root1-PowerEdge-R740:~$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
(tensorflow_2.5.0_py_3.8) lxb@root1-PowerEdge-R740:~$ python
Python 3.8.10 (default, Jun 4 2021, 15:09:15)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2021-06-14 15:23:20.458545: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
// 可以看到成功加载了
>>> print(tf.test.is_gpu_available()) //查看gpu是否可用
WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2021-06-14 15:23:45.979697: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-06-14 15:23:45.983501: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-06-14 15:23:46.014759: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:3b:00.0 name: Quadro RTX 6000 computeCapability: 7.5
coreClock: 1.62GHz coreCount: 72 deviceMemorySize: 22.17GiB deviceMemoryBandwidth: 581.23GiB/s
2021-06-14 15:23:46.015026: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-06-14 15:23:46.026269: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-06-14 15:23:46.026519: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-06-14 15:23:46.031289: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-06-14 15:23:46.031959: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-06-14 15:23:46.040981: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-06-14 15:23:46.045167: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-06-14 15:23:46.045669: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64
2021-06-14 15:23:46.045754: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-06-14 15:23:46.335967: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-06-14 15:23:46.336031: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2021-06-14 15:23:46.336047: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
False
看到 Could not load dynamic library ‘libcudnn.so.8’ ,应该是cudnn动态库没装。
到官网下载 https://developer.nvidia.com/rdp/cudnn-archive
下载 cuDNN Library for Linux (x86_64)
然后上传到服务器,并使用下面的命令解压
(tensorflow_2.5.0_py_3.8) lxb@root1-PowerEdge-R740:~$ tar xvf cudnn-11.3-linux-x64-v8.2.0.53.tgz
说明:因为是公用一台服务器,当前用户没有sudo权限,因此可以将该运行库放自己的目录下, 上面解压完后即解压到家目录下的 cuda
文件夹下。如果拥有权限,也可以将动态运行库拷贝到cuda目录下,如下:
sudo cp cuda/include/* /usr/local/cuda/include
sudo cp cuda/lib64/* /usr/local/cuda/lib64
同理,将cudnn动态运行库,添加到环境变量
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:~/cuda/lib64:~/cuda/include
再次测试:
(tensorflow_2.5.0_py_3.8) lxb@root1-PowerEdge-R740:~$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:~/cuda/lib64:~/cuda/include
(tensorflow_2.5.0_py_3.8) lxb@root1-PowerEdge-R740:~$ python
Python 3.8.10 (default, Jun 4 2021, 15:09:15)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2021-06-14 16:01:55.065701: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
>>> print(tf.test.is_gpu_available())
WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2021-06-14 16:01:59.521642: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-06-14 16:01:59.532787: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-06-14 16:01:59.563297: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:3b:00.0 name: Quadro RTX 6000 computeCapability: 7.5
coreClock: 1.62GHz coreCount: 72 deviceMemorySize: 22.17GiB deviceMemoryBandwidth: 581.23GiB/s
2021-06-14 16:01:59.566596: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-06-14 16:01:59.571464: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-06-14 16:01:59.574722: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-06-14 16:01:59.576355: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-06-14 16:01:59.579788: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-06-14 16:01:59.587985: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-06-14 16:01:59.591990: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-06-14 16:01:59.592397: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-06-14 16:01:59.594745: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-06-14 16:01:59.597845: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-06-14 16:02:01.489575: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-06-14 16:02:01.489640: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2021-06-14 16:02:01.489663: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2021-06-14 16:02:01.492965: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/device:GPU:0 with 16773 MB memory) -> physical GPU (device: 0, name: Quadro RTX 6000, pci bus id: 0000:3b:00.0, compute capability: 7.5)
True
// 成功,gpu可用
上面的 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:~/cuda/lib64:~/cuda/include,只是暂时生效。重新登录或者新开终端都会失效, 因此可以写入.bashrc文件
使其自动生效。
(tensorflow_2.5.0_py_3.8) lxb@root1-PowerEdge-R740:~$ vim ~/.bashrc
// 在文件末尾添加如下
# <<< conda initialize <<<
# <<< tensorflow gpu env path set <<<
# <<< add by lxb >>>
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:~/cuda/lib64:~/cuda/include
//说明vim输入i来修改, 修改完输入 :wq 保存并退出
// source ~/.bashrc 重新加载环境变量,使得修改立即生效
(tensorflow_2.5.0_py_3.8) lxb@root1-PowerEdge-R740:~$ source ~/.bashrc
//重新进入环境
(base) lxb@root1-PowerEdge-R740:~$ conda activate tensorflow_2.5.0_py_3.8
再次测试:
(tensorflow_2.5.0_py_3.8) lxb@root1-PowerEdge-R740:~$ python
Python 3.8.10 (default, Jun 4 2021, 15:09:15)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2021-06-14 16:09:38.689021: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
>>> print(tf.test.is_gpu_available())
WARNING:tensorflow:From :1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2021-06-14 16:09:42.876181: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-06-14 16:09:42.880446: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-06-14 16:09:42.910784: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:3b:00.0 name: Quadro RTX 6000 computeCapability: 7.5
coreClock: 1.62GHz coreCount: 72 deviceMemorySize: 22.17GiB deviceMemoryBandwidth: 581.23GiB/s
2021-06-14 16:09:42.913971: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-06-14 16:09:42.918835: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-06-14 16:09:42.920128: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-06-14 16:09:42.921675: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-06-14 16:09:42.922133: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-06-14 16:09:42.927200: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-06-14 16:09:42.931599: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-06-14 16:09:42.931923: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-06-14 16:09:42.934156: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-06-14 16:09:42.934246: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-06-14 16:09:44.805613: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-06-14 16:09:44.805678: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2021-06-14 16:09:44.805693: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2021-06-14 16:09:44.809012: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/device:GPU:0 with 16773 MB memory) -> physical GPU (device: 0, name: Quadro RTX 6000, pci bus id: 0000:3b:00.0, compute capability: 7.5)
True
如上说明安装成功。