Ubuntu16.04+GXT1060+cuda9.0+cudnn7.1+tensorflow1.8

目录

  • 目录
  • 准备工作
    • 环境基础
    • 安装包下载
    • 安装流程
  • 清理环境
  • 安装Nvidia驱动
    • 验证
  • 安装Cuda&cudnn
    • Cuda
    • cudnn
    • 验证cuda的安装
  • 安装tensorflow


准备工作

环境基础

linux系统:16.04.1-Ubuntu X86_64

显卡:GXT1060

gcc –version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

安装包下载

cuda: https://developer.nvidia.com/cuda-90-download-archive

cudnn: https://developer.nvidia.com/cudnn

安装流程

  1. 清理环境
  2. 安装Nvidia驱动
  3. 安装Cuda&cudnn
  4. 安装tensorflow

Tips:
cuda9.2版本太高,最新的tensorflow-gpu 1.8也没有提供支持
强制安装cuda9.2后,在import tensorflow 会出现没有cuda9.0***的异常
所以只好重新版本降级,这也就是第一步的由来。。。宝宝差点哭瞎 T×T
有兴趣自己源码编译的亲,可以尝试下面的指导。玩了一遍,我选择放弃。
How to install Tensorflow GPU with CUDA 9.2 for python on Ubuntu

清理环境

sudo apt-get remove –purge cuda*

反正我是没有找到cuda**/bin/uninstall.pl,只好用这种比较暴力的方式,能用就行。

sudo apt-get remove –purge nvidia*

其实这个操作的意义不大了,机子上多保存几个版本又没什么事,主要是我版本有点多,默认启动时有加载了一个无法识别的手动安装版本,晃得我不行,总之,这个命令谨慎操作吧。

关于cudnn,其实不用在意,这个动态库安装配置全部默认在/usr/local/cuda中完成,在卸载cuda时,会全部被删除。

把 开源的nouveau显卡驱动加入黑名单并禁用用 nouveau 内核模块
修改启动以后的加载项文件/etc/modprobe.d/blacklist.conf

# this one might not be required for x86 32 bit users.
blacklist amd76x_edac 

blacklist vga16fb
blacklist nouveau
blacklist rivafb
blacklist nvidiafb
blacklist rivatv

安装Nvidia驱动

直接下载驱动进行安装,可选。但由于我们要安装cuda,cuda本身会包含某个版本的Nvidia驱动,比如duda9.0版本就包含了nvdia-384,所以这里单独进行驱动下没什么必要,除非你明确自己需要最新的版本:

nvidia-367_384.81-0ubuntu1_amd64.deb
nvidia-367-dev_384.81-0ubuntu1_amd64.deb
nvidia-384_384.81-0ubuntu1_amd64.deb
nvidia-384-dev_384.81-0ubuntu1_amd64.deb
nvidia-libopencl1-367_384.81-0ubuntu1_amd64.deb
nvidia-libopencl1-384_384.81-0ubuntu1_amd64.deb
nvidia-modprobe_384.81-0ubuntu1_amd64.deb
nvidia-opencl-icd-367_384.81-0ubuntu1_amd64.deb
nvidia-opencl-icd-384_384.81-0ubuntu1_amd64.deb
nvidia-settings_384.81-0ubuntu1_amd64.deb

安装方式:sudo apt-get install nvidia-384

验证

nvidia-smi
Sat Jun 23 12:18:04 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.26                 Driver Version: 396.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 106...  Off  | 00000000:01:00.0  On |                  N/A |
| 37%   34C    P8     6W / 130W |    359MiB /  6075MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1091      G   /usr/lib/xorg/Xorg                           159MiB |
|    0      1866      G   compiz                                        73MiB |
|    0      2207      G   fcitx-qimpanel                                 7MiB |
|    0      2417      G   ...-token=3D04A15463A2BBB366CB13528973ED05   117MiB |
+-----------------------------------------------------------------------------+

安装Cuda&cudnn

Cuda

Ubuntu16.04+GXT1060+cuda9.0+cudnn7.1+tensorflow1.8_第1张图片
下载需要的4个安装包:

  1. Base Installer安装包

    sudo dpkg -i cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64.deb
    sudo apt-key add /var/cuda-repo-9-0-local/7fa2af80.pub

    这两句话相当于在建立了一个可信任的本地仓库/var/cuda-repo-9-0-local
    仓库包括:cuda的安装包,cuda的开发包,cuda的动态库,nvidia的安装包。。。

    7fa2af80.pub
    cuda_9.0.176-1_amd64.deb
    cuda-9-0_9.0.176-1_amd64.deb
    cuda-command-line-tools-9-0_9.0.176-1_amd64.deb
    cuda-core-9-0_9.0.176-1_amd64.deb
    cuda-cublas-9-0_9.0.176-1_amd64.deb
    cuda-cublas-dev-9-0_9.0.176-1_amd64.deb
    cuda-cudart-9-0_9.0.176-1_amd64.deb
    cuda-cudart-dev-9-0_9.0.176-1_amd64.deb
    cuda-cufft-9-0_9.0.176-1_amd64.deb
    cuda-cufft-dev-9-0_9.0.176-1_amd64.deb
    cuda-curand-9-0_9.0.176-1_amd64.deb
    cuda-curand-dev-9-0_9.0.176-1_amd64.deb
    cuda-cusolver-9-0_9.0.176-1_amd64.deb
    cuda-cusolver-dev-9-0_9.0.176-1_amd64.deb
    cuda-cusparse-9-0_9.0.176-1_amd64.deb
    cuda-cusparse-dev-9-0_9.0.176-1_amd64.deb
    cuda-demo-suite-9-0_9.0.176-1_amd64.deb
    cuda-documentation-9-0_9.0.176-1_amd64.deb
    cuda-driver-dev-9-0_9.0.176-1_amd64.deb
    cuda-drivers_384.81-1_amd64.deb
    cuda-gdb-src-9-0_9.0.176-1_amd64.deb
    cuda-libraries-9-0_9.0.176-1_amd64.deb
    cuda-libraries-dev-9-0_9.0.176-1_amd64.deb
    cuda-license-9-0_9.0.176-1_amd64.deb
    cuda-minimal-build-9-0_9.0.176-1_amd64.deb
    cuda-misc-headers-9-0_9.0.176-1_amd64.deb
    cuda-npp-9-0_9.0.176-1_amd64.deb
    cuda-npp-dev-9-0_9.0.176-1_amd64.deb
    cuda-nvgraph-9-0_9.0.176-1_amd64.deb
    cuda-nvgraph-dev-9-0_9.0.176-1_amd64.deb
    cuda-nvml-dev-9-0_9.0.176-1_amd64.deb
    cuda-nvrtc-9-0_9.0.176-1_amd64.deb
    cuda-nvrtc-dev-9-0_9.0.176-1_amd64.deb
    cuda-runtime-9-0_9.0.176-1_amd64.deb
    cuda-samples-9-0_9.0.176-1_amd64.deb
    cuda-toolkit-9-0_9.0.176-1_amd64.deb
    cuda-visual-tools-9-0_9.0.176-1_amd64.deb
    libcuda1-367_384.81-0ubuntu1_amd64.deb
    libcuda1-384_384.81-0ubuntu1_amd64.deb
    libxnvctrl0_384.81-0ubuntu1_amd64.deb
    libxnvctrl-dev_384.81-0ubuntu1_amd64.deb
    nvidia-367_384.81-0ubuntu1_amd64.deb
    nvidia-367-dev_384.81-0ubuntu1_amd64.deb
    nvidia-384_384.81-0ubuntu1_amd64.deb
    nvidia-384-dev_384.81-0ubuntu1_amd64.deb
    nvidia-libopencl1-367_384.81-0ubuntu1_amd64.deb
    nvidia-libopencl1-384_384.81-0ubuntu1_amd64.deb
    nvidia-modprobe_384.81-0ubuntu1_amd64.deb
    nvidia-opencl-icd-367_384.81-0ubuntu1_amd64.deb
    nvidia-opencl-icd-384_384.81-0ubuntu1_amd64.deb
    nvidia-settings_384.81-0ubuntu1_amd64.deb
    

    最后一句才是安装的命令。Attention,如果你有多版本的本地仓库,比如我:

    cuda-repo-9-0-local
    cuda-repo-9-2-local

    所以正确的安装方式变成了:

    sudo apt-get install cuda-9.0
    从官方文档中可以看到,提示安装cuda-libraries-9-0。注意,这种方式实际上并没有完成安装,
    你会发现/usr/local/cuda没有被建立(这个目录下包含了一部分的头文件和动态库)
  2. Patch安装包

    这个比较简单:
    sudo dpkg -i 
    cuda-repo-ubuntu1604-9-0-local-cublas-performance-update_1.0-1_amd64.deb
    cuda-repo-ubuntu1604-9-0-local-cublas-performance-update-2_1.0-1_amd64.deb
    cuda-repo-ubuntu1604-9-0-local-cublas-performance-update-3_1.0-1_amd64.deb

    这种方式也会在本地建立3个仓库,建议在本地仓库全部安装完成后再进行cuda的安装。

cudnn

这个直接参考官方文档:

$ tar -xzvf cudnn-9.0-linux-x64-v7.tgz

$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h
/usr/local/cuda/lib64/libcudnn*

执行完如上命令之后,cuDNN 就安装好了,这时我们可以发现在 /usr/local/cuda/include 目录下就多了 cudnn.h 头文件。

验证cuda的安装

nvcc -V 
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

首先验证Nvidia CUDA Compiler (NVCC)编译器是否正确安装。
再验证能够正确编译:

/usr/local/cuda-9.0/samples$ ls
0_Simple     2_Graphics  4_Finance      6_Advanced       common    Makefile
1_Utilities  3_Imaging   5_Simulations  7_CUDALibraries  EULA.txt

将整个目录copy到自定义目录/home/test

>make
make[1]: Entering directory '/home/test/0_Simple/simpleSurfaceWrite'
/usr/local/cuda-9.0/bin/nvcc -ccbin g++ -I../../common/inc  -m64    -gencode 
arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode 
arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode 
arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode 
arch=compute_70,code=sm_70 -gencode arch=compute_70,code=compute_70 -o simpleSurfaceWrite.o -c 
simpleSurfaceWrite.cu
...
mkdir -p ../../bin/x86_64/linux/release
cp freeImageInteropNPP ../../bin/x86_64/linux/release
make[1]: Leaving directory '/home/test/7_CUDALibraries/freeImageInteropNPP'
Finished building CUDA samples
>./bin/x86_64/linux/release/deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1060 6GB"
  CUDA Driver Version / Runtime Version          9.2 / 9.0
  CUDA Capability Major/Minor version number:    6.1
  ...
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.2, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS

安装tensorflow

pip install tensorflow-gpu

上述命令会默认安装1.8版本。
验证:

    import tensorflow as tf
    hello = tf.constant("Hello, TensorFlow!")
    sess = tf.Seesion()
    print(sess.run(hello))

import的时候出现异常libcudnn.so.7: cannot open shared object file: No such file or directory
解决方案实际上是没有讲cudnn的动态库路径包含进Ubuntu的环境变量:

export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

输出结果应该是hello tensorflow,并且附带上GPU相关信息。

至此安装过程全部完成。

你可能感兴趣的:(python,tensorflow)