前言:
安装TensorFlow的GPU版本真的不是一件容易的事,好难,网上的各种安装教程倒是不少,但是基本没有一个能完全照着那种方法就能安装成功的,甚至有些连最基本的一些必需的步骤都没写到,那样子的话能安装到位才是奇迹。
在经历各种踩坑后,我通过大量的互联网搜索,综合各个网站上写到的安装tensorflow-gpu的方法,终于总结出来一个走的通的方法,并且亲自动手实践,成功在Linux(Ubuntu) + Python3.5安装且运行了一个卷积神经网络的训练程序。
Python2.7下安装过程跟本文内容大同小异,可参考NVIDIA官方教程:
http://www.nvidia.com/object/gpu-accelerated-applications-tensorflow-installation.html
准备:
- 一块计算性能大于等于3.0的NVIDIA的显卡
(不知道自己的NVIDIA GPU的计算性能的可以在这里查到: https://developer.nvidia.com/cuda-gpus )
- 装有Linux系统的电脑
(本文以Ubuntu 16.04.2 LTS Gnome 64位系统为例展开,其他的也是大同小异)
- Python 3.5
(建议使用Python3版本,而不是2.7版,部分Linux系统默认可能没有安装Python3)
开始:
安装NVIDIA Driver
首先检查你的NVIDIA VGA card model,检查你的显卡型号
- $ sudo lshw -numeric -C display
我的电脑上可以看到其中的nvidia显卡有这样一条信息
product: GM107M [GeForce GTX 850M] [10DE:1391]
根据刚才显示的信息内容,到官网上选择合适的选项,检查你的显卡所适合的驱动版本:
http://www.nvidia.com/Download/index.aspx
我的电脑上的显卡所适合的版本是375.66。
同样,你还可以在终端上输入命令来看所需的版本:
- $ ubuntu-drivers devices
我的电脑上出现有nividia-375字样,跟在官网上查的一样。所以我们可以在终端上使用apt-get来安装它,安装前建议先update一下源,或者更换一个速度快的源。
- $ sudo apt-get install nvidia-375
其中,nvidia-375这个需要根据自己显卡的型号所支持的驱动版本来修改后面的数字。
装完之后,你可以在dashboard上搜索nvidia,看到有 NVIDIA X Server Settings的东西,那么驱动就安装成功了,如果没有,那就重启一次电脑看看。
安装Cuda 8:
首先是去官网下载 cuda toolkit 8.0 ,选择你电脑对应的选项,下载文件一定要选择runfile。
https://developer.nvidia.com/cuda-downloads
下载完成之后,执行
- $ sudo sh cuda_8.0.61_375.26_linux.run
就进入安装了,开始的一大堆文字都是End User License Agreement,可以CTRL +C 跳过,然后输入accept表示接受协议
接下来就是安装的交互界面,根据提示一步一步进行就好。
在开始的
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 367.48?
选择n,因为已经安装驱动了。
- Using more to view the EULA.
-
- End User License Agreement
-
- --------------------------
-
- Preface
-
- -------
-
- The following contains specific license terms and conditions
-
- for four separate NVIDIA products. By accepting this
-
- agreement, you agree to comply with all the terms and
-
- conditions applicable to the specific product(s) included
-
- herein.
-
-
- NVIDIA CUDA Toolkit
-
-
- Description
-
-
-
- The NVIDIA CUDA Toolkit provides command-line and graphical
-
- tools for building, debugging and optimizing the performance
-
- of applications accelerated by NVIDIA GPUs, runtime and math
-
- libraries, and documentation including programming guides,
-
- user manuals, and API references. The NVIDIA CUDA Toolkit
-
- License Agreement is available in Chapter 1.
-
-
-
- Default Install Location of CUDA Toolkit
-
-
-
- Windows platform:
-
-
-
- Do you accept the previously read EULA?
-
- accept/decline/quit: accept
-
-
-
- Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 367.48?
-
- (y)es/(n)o/(q)uit: n
-
-
-
- Install the CUDA 8.0 Toolkit?
-
- (y)es/(n)o/(q)uit: y
-
-
-
- Enter Toolkit Location
-
- [ default is /usr/local/cuda-8.0 ]:
-
-
-
- Do you want to install a symbolic link at /usr/local/cuda?
-
- (y)es/(n)o/(q)uit: y
-
-
-
- Install the CUDA 8.0 Samples?
-
- (y)es/(n)o/(q)uit: y
-
-
-
- Enter CUDA Samples Location
-
- [ default is /home/kinny ]:
-
- Installing the CUDA Toolkit in /usr/local/cuda-8.0 ...
-
- Missing recommended library: libXmu.so
-
- Installing the CUDA Samples in /home/kinny ...
-
- Copying samples to /home/kinny/NVIDIA_CUDA-8.0_Samples now...
-
- Finished copying samples.
-
- ===========
-
- = Summary =
-
- ===========
-
- Driver: Not Selected
-
- Toolkit: Installed in /usr/local/cuda-8.0
-
- Samples: Installed in /home/kinny, but missing recommended libraries
-
- Please make sure that
-
- - PATH includes /usr/local/cuda-8.0/bin
-
- - LD_LIBRARY_PATH includes /usr/local/cuda-8.0/lib64, or, add /usr/local/cuda-8.0/lib64 to /etc/ld.so.conf and run ldconfig as root
-
- To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-8.0/bin
-
- Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-8.0/doc/pdf for detailed information on setting up CUDA.
-
- ***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 361.00 is required for CUDA 8.0 functionality to work.
-
- To install the driver using this installer, run the following command, replacing with the name of this run file:
-
- sudo .run -silent -driver
-
- Logfile is /tmp/cuda_install_17494.log
配置Cuda环境变量:
在 ~/.bashrc 的最后添加
- export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
- export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
- export CUDA_HOME=/usr/local/cuda
其中,
前 2 个(PATH, LD_LIBRARY_PATH) 是 CUDA 官网安装文档中建议的变量。
第 3 个(CUDA_HOME)是 tensorflow-GPU 版本要求的变量。
配置完环境变量之后,一定要更新一下,否则不能立即生效。也可以通过重启电脑使得环境变量生效。
- $ source ~/.bashrc
网上很多其他的相关教程都遗漏了这一步,对于新手来说,这是致命的灾难,往往会出现,明明按照教程配置的,却根本无法使用GPU的情况。
检查显卡驱动安装结果
- $ nvidia-smi
-
- Wed Jun 7 04:27:39 2017
-
- +-----------------------------------------------------------------------------+
-
- | NVIDIA-SMI 375.66 Driver Version: 375.66 |
-
- |-------------------------------+----------------------+----------------------+
-
- | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
-
- | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
-
- |===============================+======================+======================|
-
- | 0 GeForce GTX 850M Off | 0000:01:00.0 Off | N/A |
-
- | N/A 50C P0 N/A / N/A | 316MiB / 2002MiB | 2% Default |
-
- +-------------------------------+----------------------+----------------------+
-
-
-
- +-----------------------------------------------------------------------------+
-
- | Processes: GPU Memory |
-
- | GPU PID Type Process name Usage |
-
- |=============================================================================|
-
- | 0 1221 G /usr/lib/xorg/Xorg 24MiB |
-
- | 0 1609 G /usr/lib/xorg/Xorg 150MiB |
-
- | 0 1930 G /usr/bin/gnome-shell 90MiB |
-
- | 0 2424 G ...el-token=3A74D277B6B5EFD197119F9AEDDC5740 24MiB |
如果要卸装:
To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.0/bin
然后 apt purge nvidia*
安装深度学习库cuDNN
首先到官网下载cuDNN5.1 ( https://developer.nvidia.com/cudnn ),需要注册为开发者才能下载,而且直接下载可能速度非常慢,如果有别人已经下载过的,应该是可以直接拿过来用,不过前提是版本是5.1的。
文件名为:cudnn-8.0-linux-x64-v5.1.tgz
然后解压
- $ tar xvzf cudnn-8.0-linux-x64-v5.1.tgz
然后将库和头文件copy到cuda目录(一定是你自己安装的目录如/usr/local/cuda-8.0),不过正确安装的话,ubuntu一般就会有软链接/usr/local/cuda -> /usr/local/cuda-8.0/
- $ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
-
- $ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
接下来就是修改文件访问权限:
- $ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
最后一步:安装TensorFlow的GPU版本
在我博客的前一篇文章中,我写了如何安装支持GPU版本的TensorFlow,详见:
这里我再重复一下
终端命令行输入
- $ sudo pip3 install tensorflow-gpu
等待一会儿,就安装好了。然后可以使用我刚才提到的那篇文章中的测试代码验证是否安装成功,以及GPU是否可用。
- $ python3
- >>> import tensorflow as tf
- >>> hello = tf.constant('Hello, TensorFlow!')
- >>> sess = tf.Session()
- >>> print(sess.run(hello))
- Hello, TensorFlow!
- >>> a = tf.constant(10)
- >>> b = tf.constant(32)
- >>> print(sess.run(a + b))
- 42
- >>>
对于GPU版本,可以使用下面的代码来测试TF是否可以使用GPU来加速计算,如果输出False,那就是不能使用,否则是可以使用,会输出GPU信息。
- >>> tf.test.is_gpu_available()
- 2017-06-06 05:36:41.972817: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
- 2017-06-06 05:36:41.972852: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
- 2017-06-06 05:36:41.972863: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
- 2017-06-06 05:36:41.972871: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
- 2017-06-06 05:36:41.972879: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
- 2017-06-06 05:36:42.306655: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
- 2017-06-06 05:36:42.306887: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties:
- name: GeForce GTX 850M
- major: 5 minor: 0 memoryClockRate (GHz) 0.9015
- pciBusID 0000:01:00.0
- Total memory: 1.96GiB
- Free memory: 1.44GiB
- 2017-06-06 05:36:42.306905: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0
- 2017-06-06 05:36:42.306911: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0: Y
- 2017-06-06 05:36:42.306927: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 850M, pci bus id: 0000:01:00.0)
- True
至此,如果你看到了跟我一样的输出的话,那么恭喜你,tensorflow-gpu可以正常使用GPU来计算了。
From https://blog.ailemon.me/2017/06/06/install-tensorflow-gpu-on-ubuntu-linux/