【深度学习】CentOS 7 安装GPU版Tensorflow教程(二)


9.安装NVIDIA驱动

安装NVIDIA驱动是很重要的步骤,该步成功了,后面也就基本上一马平川了。
(1)使用第2步中的方法,找到你的驱动型号,然后在官网找到与之匹配的型号,下载安装,下载链接戳我
(2)使用一下命令安装
$ sudo sh NVIDIAxxx --kernel-source-path=/usr/src/kernels/x.xx.x-xxxxx

其中 NVIDIAxxx 为 nvidia 驱动脚本文件, x.xx.x-xxxx 为 kernel 版本号,kernel版本号可以使用一下命令查找
[littlebei@localhost ~]$ uname -r
3.10.0-693.2.2.el7.x86_64

在安装过程中,可能会出现一下两种错误:
第一种:
The driver installation is unable to locate the kernel source. Please make sure that the kernel source packages are  
installed and set up correctly.
If you know that the kernel source packages are installed and set up correctly, you may pass the location of thekernel source with the '--kernel-source-path' flag.

解决方案:
$ sudo yum install epel-release
$ sudo yum install --enablerepo=epel dkms

第二种:
ERROR: Unable to load the 'nvidia-drm' kernel module.

解决方案:
One probable reason is that the system is boot from UEFI but Secure Boot option is turned on in the BIOS setting. 
Turn it off and the problem will be solved.
这也就是为什么在第一步中我让大家关掉UEFI的原因了。


(3)具体的安装执行过程
在accept的页面选择Accept,在32-bit页面选择No,在X- configuration页面选择Yes


10.安装cuda
在这个页面选择与系统版本匹配的cuda,戳我,进行下载,这里建议不要下载太新的cuda版本,因为下载太新的版本很有可能和tensorflow版本匹配不上,这里也是笔者踩过得坑。
安装的命令
$ sudo sh cuda_8.0.61_375.26_linux.run

安装执行以下过程
# accept
------------------------------------------------------------- 
Do you accept the previously read EULA?accept/decline/quit: accept
# no
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 375.26?(y)es/(n)o/(q)uit: n
-------------------------------------------------------------
# 后面的就都选yes或者default
Do you want to install the OpenGL libraries?
(y)es/(n)o/(q)uit [ default is yes ]: 
Do you want to run nvidia-xconfig?
This will update the system X configuration file so that the NVIDIA X driver is used. 
The pre-existing X configuration file will be backed up.
This option should not be used on systems that require a custom X configuration, 
such as systems with multiple GPU vendors.
(y)es/(n)o/(q)uit [ default is no ]: y
Install the CUDA 8.0 Toolkit?
(y)es/(n)o/(q)uit: y


Enter Toolkit Location [ default is /usr/local/cuda-8.0 ]:


Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y


Install the CUDA 8.0 Samples?
(y)es/(n)o/(q)uit: y


Enter CUDA Samples Location
[ default is /root ]: 


Installing the NVIDIA display driver...


看到以下输出信息说明安装成功
The driver installation has failed due to an unknown error. Please consult the driver 
installation log located at /var/log/nvidia-installer.log.


===========
= Summary =
===========


Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-8.0
Samples: Installed in /root, but missing recommended libraries


Please make sure that
  - PATH includes /usr/local/cuda-8.0/bin 
  - LD_LIBRARY_PATH includes /usr/local/cuda-8.0/lib64, or, 
  add /usr/local/cuda-8.0/lib64 to /etc/ld.so.conf and run ldconfig as root


To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-8.0/bin


Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-8.0/doc/pdf for detailed
information on setting up CUDA.


***WARNING: Incomplete installation! This installation did not install the CUDA Driver. 
A driver of version at least 361.00 is required for CUDA 8.0 functionality to work.
To install the driver using this installer, run the following command, 
replacing  with the name of this run file:
     sudo .run -silent -driver


Logfile is /tmp/cuda_install_192.log


11.配置cuda环境变量
编辑~/.bashrc文件
$ sudo vim ~/.bashrc

添加如下内容
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:/usr/local/cuda-8.0/extras/CUPTI/lib64:$LD_LIBRARY_PATH
export CUDA_HOME=/usr/local/cuda-8.0/


12.安装cuDNN
在官网上下载cuDNN包,戳我(注意版本匹配的问题)
下载完成执行以下操作
$ tar -xvzf cudnn-8.0-linux-x64-v6.0.tgz
$ cp include/* /usr/local/cuda/include
$ cp lib64/* /usr/local/cuda/lib64


13.安装gpu版的TensorFlow
$ sudo pip install tensorflow-gpu

这里是使用pip直接安装的,如果你的机器上没有安装pip的话,可以参考我的另外一篇博文里面有写到pip的安装教程。


14.测试TensorFlow
走过前面的沟沟坎坎,终于到了测试这一步了,是不是很happy。
Python 2.7.5 (default, Jun 17 2014, 18:11:42)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
2017-06-28 16:42:53.518877: W tensorflow/core/platform/cpu_feature_guard.cc:45] 
The TensorFlow library wasn't compiled to use SSE4.1 instructions, 
but these are available on your machine and could speed up CPU computations.
2017-06-28 16:42:53.518906: W tensorflow/core/platform/cpu_feature_guard.cc:45] 
The TensorFlow library wasn't compiled to use SSE4.2 instructions, 
but these are available on your machine and could speed up CPU computations.
2017-06-28 16:42:53.518914: W tensorflow/core/platform/cpu_feature_guard.cc:45] 
The TensorFlow library wasn't compiled to use AVX instructions, 
but these are available on your machine and could speed up CPU computations.
2017-06-28 16:42:53.518921: W tensorflow/core/platform/cpu_feature_guard.cc:45] 
The TensorFlow library wasn't compiled to use AVX2 instructions, 
but these are available on your machine and could speed up CPU computations.
2017-06-28 16:42:53.518929: W tensorflow/core/platform/cpu_feature_guard.cc:45] 
The TensorFlow library wasn't compiled to use FMA instructions, 
but these are available on your machine and could speed up CPU computations.
2017-06-28 16:42:54.099744: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] 
successful NUMA node read from SysFS had negative value (-1), 
but there must be at least one NUMA node, so returning NUMA node zero
2017-06-28 16:42:54.100218: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] 
Found device 0 with properties:
name: Tesla M60
major: 5 minor: 2 memoryClockRate (GHz) 1.1775
pciBusID 0000:00:02.0
Total memory: 7.93GiB
Free memory: 7.86GiB
2017-06-28 16:42:54.100243: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0
2017-06-28 16:42:54.100251: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0:   Y
2017-06-28 16:42:54.100266: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] 
Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla M60, pci bus id: 0000:00:02.0)
>>> print(sess.run(hello))
Hello, TensorFlow!

如果你可以正确的运行上面这个小的例子,那么恭喜你,gpu版的TensorFlow安装成功了,还等什么,赶紧造起来吧!






参考文章
1.TensorFlow —— GPU版安装指南
2.Linux 下安装 NVIDIA 显卡驱动
3.CentOS7.0下GPU安装配置指南及TensorFlow/Openface的GPU使用
4.在CentOS 7上安装Tensorflow
5.Unable to locate the kernel source错误解决
6.Unable to load the 'nvidia-drm' kernel module错误解决
7.官方安装cuda教程

你可能感兴趣的:(【深度学习】)