#软件安装过程
验证是否安装成功 python3
验证是否安装成功:
nvidia-smi
Wed Dec 4 04:35:16 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 105... Off | 00000000:02:00.0 On | N/A |
| 62% 66C P0 N/A / 95W | 4024MiB / 4039MiB | 88% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1470 G /usr/libexec/Xorg 39MiB |
| 0 1840 G /usr/bin/gnome-shell 42MiB |
| 0 6996 C .../tensorflow-gpu-1.15.0/venv/bin/python3 3925MiB |
+-----------------------------------------------------------------------------+
deviceQuery deviceQuery.cpp deviceQuery.o Makefile NsightEclipse.xml readme.txt
(venv) [root@localhost deviceQuery]# ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1050 Ti"
CUDA Driver Version / Runtime Version 10.2 / 10.2
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 4040 MBytes (4235919360 bytes)
( 6) Multiprocessors, (128) CUDA Cores/MP: 768 CUDA Cores
GPU Max Clock rate: 1493 MHz (1.49 GHz)
Memory Clock rate: 3504 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 1048576 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 2 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS
(venv) [root@localhost deviceQuery]# pwd
/usr/local/cuda/samples/1_Utilities/deviceQuery
(venv) [root@localhost deviceQuery]# ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
pip install tensorflow-gpu -i https://pypi.tuna.tsinghua.edu.cn/simple
或你想用比较低的版本。
pip install tensorflow-gpu=1.15.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
export PATH=/usr/local/cuda/binKaTeX parse error: Expected '}', got 'EOF' at end of input: {PATH:+:{PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:KaTeX parse error: Expected '}', got 'EOF' at end of input: …LIBRARY_PATH:+:{LD_LIBRARY_PATH}}
验证tensorflow-gpu 是否可以正常工作
source /var/server/tensorflow-gpu-1.15.0/venv/bin/activate
#“想要测试tensorflow是否可以使用GPU:”
import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
然后看到输出:
2019-12-04 09:47:43.083342: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2019-12-04 09:47:43.083399: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:
2019-12-04 09:47:43.083440: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:
2019-12-04 09:47:43.083478: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:
2019-12-04 09:47:43.083516: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:
2019-12-04 09:47:43.083553: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:
2019-12-04 09:47:43.083593: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:
2019-12-04 09:47:43.083599: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
发现很多库都找不到.
依次搜索 每个缺失的lib
find / -name “libcublas.so."
find / -name " libcufft.so.”
…
然后做软链接到/usr/local/cuda/lib64/
ln -s libcudart.so.10.2 libcudart.so.10.0
ln -s libcublas.so.10.2 libcublas.so.10.0
ln -s libcufft.so.10.2 libcufft.so.10.0
ln -s libcurand.so.10.2 libcurand.so.10.0
ln -s libcusolver.so.10.2 libcusolver.so.10.0
ln -s libcusparse.so.10.2 libcusparse.so.10.0
ln -s libcudnn.so.7 libcudnn.so.7
特别注意的是:libcublas.so.10 并不在/usr/local/cuda 目录下,
cudnn 需要手动安装
https://developer.nvidia.com/rdp/cudnn-download
安装方法 https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html
然后再测试tensorflow-gpu 是否可以正常运行,
source venv/bin/activate
python3
import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
如果还是出错,
find / -name " libcudnn.so*"
找到实际路径后,一样做个软链接到/usr/local/cuda/lib64 里
至此,终于可以跑起来了。
我运行的show and Tell (看图说话)的模型训练,CPU是 3.7GHZ,8核心的,每秒处理是1.45秒/步 迭代。
用GPU计算后,0.30秒/步,速度提升了大概5倍。