linux系统:16.04.1-Ubuntu X86_64
显卡:GXT1060
gcc –version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
cuda: https://developer.nvidia.com/cuda-90-download-archive
cudnn: https://developer.nvidia.com/cudnn
Tips:
cuda9.2版本太高,最新的tensorflow-gpu 1.8也没有提供支持
强制安装cuda9.2后,在import tensorflow 会出现没有cuda9.0***的异常
所以只好重新版本降级,这也就是第一步的由来。。。宝宝差点哭瞎 T×T
有兴趣自己源码编译的亲,可以尝试下面的指导。玩了一遍,我选择放弃。
How to install Tensorflow GPU with CUDA 9.2 for python on Ubuntu
sudo apt-get remove –purge cuda*
反正我是没有找到cuda**/bin/uninstall.pl,只好用这种比较暴力的方式,能用就行。
sudo apt-get remove –purge nvidia*
其实这个操作的意义不大了,机子上多保存几个版本又没什么事,主要是我版本有点多,默认启动时有加载了一个无法识别的手动安装版本,晃得我不行,总之,这个命令谨慎操作吧。
关于cudnn,其实不用在意,这个动态库安装配置全部默认在/usr/local/cuda
中完成,在卸载cuda时,会全部被删除。
把 开源的nouveau显卡驱动加入黑名单并禁用用 nouveau 内核模块
修改启动以后的加载项文件/etc/modprobe.d/blacklist.conf
# this one might not be required for x86 32 bit users.
blacklist amd76x_edac
blacklist vga16fb
blacklist nouveau
blacklist rivafb
blacklist nvidiafb
blacklist rivatv
直接下载驱动进行安装,可选。但由于我们要安装cuda,cuda本身会包含某个版本的Nvidia驱动,比如duda9.0版本就包含了nvdia-384,所以这里单独进行驱动下没什么必要,除非你明确自己需要最新的版本:
nvidia-367_384.81-0ubuntu1_amd64.deb
nvidia-367-dev_384.81-0ubuntu1_amd64.deb
nvidia-384_384.81-0ubuntu1_amd64.deb
nvidia-384-dev_384.81-0ubuntu1_amd64.deb
nvidia-libopencl1-367_384.81-0ubuntu1_amd64.deb
nvidia-libopencl1-384_384.81-0ubuntu1_amd64.deb
nvidia-modprobe_384.81-0ubuntu1_amd64.deb
nvidia-opencl-icd-367_384.81-0ubuntu1_amd64.deb
nvidia-opencl-icd-384_384.81-0ubuntu1_amd64.deb
nvidia-settings_384.81-0ubuntu1_amd64.deb
安装方式:sudo apt-get install nvidia-384
nvidia-smi
Sat Jun 23 12:18:04 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.26 Driver Version: 396.26 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 106... Off | 00000000:01:00.0 On | N/A |
| 37% 34C P8 6W / 130W | 359MiB / 6075MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1091 G /usr/lib/xorg/Xorg 159MiB |
| 0 1866 G compiz 73MiB |
| 0 2207 G fcitx-qimpanel 7MiB |
| 0 2417 G ...-token=3D04A15463A2BBB366CB13528973ED05 117MiB |
+-----------------------------------------------------------------------------+
Base Installer安装包
sudo dpkg -i cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64.deb
sudo apt-key add /var/cuda-repo-9-0-local/7fa2af80.pub
这两句话相当于在建立了一个可信任的本地仓库/var/cuda-repo-9-0-local
。
仓库包括:cuda的安装包,cuda的开发包,cuda的动态库,nvidia的安装包。。。
7fa2af80.pub
cuda_9.0.176-1_amd64.deb
cuda-9-0_9.0.176-1_amd64.deb
cuda-command-line-tools-9-0_9.0.176-1_amd64.deb
cuda-core-9-0_9.0.176-1_amd64.deb
cuda-cublas-9-0_9.0.176-1_amd64.deb
cuda-cublas-dev-9-0_9.0.176-1_amd64.deb
cuda-cudart-9-0_9.0.176-1_amd64.deb
cuda-cudart-dev-9-0_9.0.176-1_amd64.deb
cuda-cufft-9-0_9.0.176-1_amd64.deb
cuda-cufft-dev-9-0_9.0.176-1_amd64.deb
cuda-curand-9-0_9.0.176-1_amd64.deb
cuda-curand-dev-9-0_9.0.176-1_amd64.deb
cuda-cusolver-9-0_9.0.176-1_amd64.deb
cuda-cusolver-dev-9-0_9.0.176-1_amd64.deb
cuda-cusparse-9-0_9.0.176-1_amd64.deb
cuda-cusparse-dev-9-0_9.0.176-1_amd64.deb
cuda-demo-suite-9-0_9.0.176-1_amd64.deb
cuda-documentation-9-0_9.0.176-1_amd64.deb
cuda-driver-dev-9-0_9.0.176-1_amd64.deb
cuda-drivers_384.81-1_amd64.deb
cuda-gdb-src-9-0_9.0.176-1_amd64.deb
cuda-libraries-9-0_9.0.176-1_amd64.deb
cuda-libraries-dev-9-0_9.0.176-1_amd64.deb
cuda-license-9-0_9.0.176-1_amd64.deb
cuda-minimal-build-9-0_9.0.176-1_amd64.deb
cuda-misc-headers-9-0_9.0.176-1_amd64.deb
cuda-npp-9-0_9.0.176-1_amd64.deb
cuda-npp-dev-9-0_9.0.176-1_amd64.deb
cuda-nvgraph-9-0_9.0.176-1_amd64.deb
cuda-nvgraph-dev-9-0_9.0.176-1_amd64.deb
cuda-nvml-dev-9-0_9.0.176-1_amd64.deb
cuda-nvrtc-9-0_9.0.176-1_amd64.deb
cuda-nvrtc-dev-9-0_9.0.176-1_amd64.deb
cuda-runtime-9-0_9.0.176-1_amd64.deb
cuda-samples-9-0_9.0.176-1_amd64.deb
cuda-toolkit-9-0_9.0.176-1_amd64.deb
cuda-visual-tools-9-0_9.0.176-1_amd64.deb
libcuda1-367_384.81-0ubuntu1_amd64.deb
libcuda1-384_384.81-0ubuntu1_amd64.deb
libxnvctrl0_384.81-0ubuntu1_amd64.deb
libxnvctrl-dev_384.81-0ubuntu1_amd64.deb
nvidia-367_384.81-0ubuntu1_amd64.deb
nvidia-367-dev_384.81-0ubuntu1_amd64.deb
nvidia-384_384.81-0ubuntu1_amd64.deb
nvidia-384-dev_384.81-0ubuntu1_amd64.deb
nvidia-libopencl1-367_384.81-0ubuntu1_amd64.deb
nvidia-libopencl1-384_384.81-0ubuntu1_amd64.deb
nvidia-modprobe_384.81-0ubuntu1_amd64.deb
nvidia-opencl-icd-367_384.81-0ubuntu1_amd64.deb
nvidia-opencl-icd-384_384.81-0ubuntu1_amd64.deb
nvidia-settings_384.81-0ubuntu1_amd64.deb
最后一句才是安装的命令。Attention,如果你有多版本的本地仓库,比如我:
cuda-repo-9-0-local
cuda-repo-9-2-local
所以正确的安装方式变成了:
sudo apt-get install cuda-9.0
从官方文档中可以看到,提示安装cuda-libraries-9-0。注意,这种方式实际上并没有完成安装,
你会发现/usr/local/cuda没有被建立(这个目录下包含了一部分的头文件和动态库)
Patch安装包
这个比较简单:
sudo dpkg -i
cuda-repo-ubuntu1604-9-0-local-cublas-performance-update_1.0-1_amd64.deb
cuda-repo-ubuntu1604-9-0-local-cublas-performance-update-2_1.0-1_amd64.deb
cuda-repo-ubuntu1604-9-0-local-cublas-performance-update-3_1.0-1_amd64.deb
这种方式也会在本地建立3个仓库,建议在本地仓库全部安装完成后再进行cuda的安装。
这个直接参考官方文档:
$ tar -xzvf cudnn-9.0-linux-x64-v7.tgz
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h
/usr/local/cuda/lib64/libcudnn*
执行完如上命令之后,cuDNN 就安装好了,这时我们可以发现在 /usr/local/cuda/include 目录下就多了 cudnn.h 头文件。
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
首先验证Nvidia CUDA Compiler (NVCC)编译器是否正确安装。
再验证能够正确编译:
/usr/local/cuda-9.0/samples$ ls
0_Simple 2_Graphics 4_Finance 6_Advanced common Makefile
1_Utilities 3_Imaging 5_Simulations 7_CUDALibraries EULA.txt
将整个目录copy到自定义目录/home/test
:
>make
make[1]: Entering directory '/home/test/0_Simple/simpleSurfaceWrite'
/usr/local/cuda-9.0/bin/nvcc -ccbin g++ -I../../common/inc -m64 -gencode
arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode
arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode
arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode
arch=compute_70,code=sm_70 -gencode arch=compute_70,code=compute_70 -o simpleSurfaceWrite.o -c
simpleSurfaceWrite.cu
...
mkdir -p ../../bin/x86_64/linux/release
cp freeImageInteropNPP ../../bin/x86_64/linux/release
make[1]: Leaving directory '/home/test/7_CUDALibraries/freeImageInteropNPP'
Finished building CUDA samples
>./bin/x86_64/linux/release/deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1060 6GB"
CUDA Driver Version / Runtime Version 9.2 / 9.0
CUDA Capability Major/Minor version number: 6.1
...
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.2, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS
pip install tensorflow-gpu
上述命令会默认安装1.8版本。
验证:
import tensorflow as tf
hello = tf.constant("Hello, TensorFlow!")
sess = tf.Seesion()
print(sess.run(hello))
import的时候出现异常libcudnn.so.7: cannot open shared object file: No such file or directory
解决方案实际上是没有讲cudnn的动态库路径包含进Ubuntu的环境变量:
export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
输出结果应该是hello tensorflow,并且附带上GPU相关信息。
至此安装过程全部完成。