首先说一下按照Tensorflow的初衷,博主经过多个版本的尝试以及当前项目需要选择了以下版本安装:
Tensorflow使用的是1.2.1版本
[beer@localhost ~]$ python
Python 2.7.5 (default, Aug 4 2017, 00:39:18)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.__version__
'1.2.1'
>>>
Linux机器版本
CentOS 7.4.1708 P100机器
[root@localhost ~]# cat /etc/redhat-release
CentOS Linux release 7.4.1708 (Core)
显卡内存信息
[beer@localhost ~]$ nvidia-smi
Fri Dec 29 01:13:55 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.66 Driver Version: 384.66 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:04:00.0 Off | 0 |
| N/A 34C P0 31W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-PCIE... Off | 00000000:05:00.0 Off | 0 |
| N/A 30C P0 30W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla P100-PCIE... Off | 00000000:08:00.0 Off | 0 |
| N/A 31C P0 31W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla P100-PCIE... Off | 00000000:09:00.0 Off | 0 |
| N/A 32C P0 31W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla P100-PCIE... Off | 00000000:84:00.0 Off | 0 |
| N/A 33C P0 32W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla P100-PCIE... Off | 00000000:85:00.0 Off | 0 |
| N/A 33C P0 31W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla P100-PCIE... Off | 00000000:88:00.0 Off | 0 |
| N/A 32C P0 31W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla P100-PCIE... Off | 00000000:89:00.0 Off | 0 |
| N/A 33C P0 29W / 250W | 0MiB / 16276MiB | 3% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
cuda版本是8.0
[beer@localhost ~]$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61
cudnn版本是cudnn v5.1
cudnn百度网盘下载地址:
链接:https://pan.baidu.com/s/1i5KmPPr 密码:2qnt
Python版本是2.7.5
[beer@localhost ~]$ python
Python 2.7.5 (default, Aug 4 2017, 00:39:18)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
1.先安装依赖的库
yum install gcc gcc-c++
yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r)
2.去Cuda官网下载相应的Cuda版本,我们这里使用的是cuda-8.0
cuda_8.0.61_375.26_linux.run包
百度网盘地址:
链接:https://pan.baidu.com/s/1kV1lVeB 密码:mxl8
3.屏蔽系统自带的nouveau
使用su命令切换到root用户下:
su root
打开/lib/modprobe.d/dist-blacklist.conf
将nvidiafb注释掉
# blacklist nvidiafb
然后添加以下语句:
blacklist nouveau
options nouveau modeset=0
4.备份以及重建initramfs image
备份原来的镜像
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
新建镜像
dracut /boot/initramfs-$(uname -r).img $(uname -r)
5.修改为文本模式
systemctl set-default multi-user.target
6.重新启动, 使用root用户登陆
reboot
7.查看nouveau是否已经禁用
ls mod | grep nouveau
如果没有显示相关的内容,说明已禁用
8.进入Cuda所在目录,安装cuda和驱动
chmod +x cuda_8.0.61_375.26_linux.run
sh cuda_8.0.61_375.26_linux.run
注意:安装cuda时一定不要安装OpenGL;切记,否则安装完之后图形化桌面启动不了
出现如下信息:
Do you accept the previously read EULA? (accept/decline/quit): accept
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 346.46? ((y)es/(n)o/(q)uit): y
Do you want to install the OpenGL libraries? ((y)es/(n)o/(q)uit) [ default is yes ]: n
Install the CUDA 7.0 Toolkit? ((y)es/(n)o/(q)uit): y
Enter Toolkit Location [ default is /usr/local/cuda-7.0 ]:
Do you want to install a symbolic link at /usr/local/cuda? ((y)es/(n)o/(q)uit): y
Install the CUDA 7.0 Samples? ((y)es/(n)o/(q)uit): y
...
9.安装cudnn
tar -zxf cudnn-8.0-linux-x64-v5.1.tgz
cd cuda
sudo cp lib64/* /usr/local/cuda/lib64/
sudo cp include/* /usr/local/cuda/include/
10.设置cuda的环境变量,在用户的.bashrc文件的末尾添加如下代码
# cuda
export CUDA_HOME=/usr/local/cuda-8.0
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$CUDA_HOME/lib:$PATHH
11.修改运行级别回图形模式
systemctl set-default graphical.target
12.重新启动,并测试是否安装成功
nvidia-smi
注意:安装cuda时一定不要安装OpenGL;切记,否则安装完之后图形化桌面启动不了
以上按照的是cuda和cudnn,下面是安装tensorflow的过程:
我们这里使用的PIP安装
这是官方给出的方法,我们使用下面方法安装:
# Ubuntu/Linux 64-bit, CPU only, Python 2.7:
(tensorflow)$ pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.8.0rc0-cp27-none-linux_x86_64.whl
# Ubuntu/Linux 64-bit, GPU enabled, Python 2.7. Requires CUDA toolkit 7.5 and CuDNN v4.
# For other versions, see "Install from sources" below.
(tensorflow)$ pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.8.0rc0-cp27-none-linux_x86_64.whl
# Mac OS X, CPU only:
(tensorflow)$ pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/mac/tensorflow-0.8.0rc0-py2-none-any.whl
CPU版
sudo pip install tensorflow==【版本号】
我们这里使用的版本是1.2.1
sudo pip install tensorflow==1.2.1
GPU版
sudo pip install tensorflow-gpu==【版本号】
我们这里使用的版本是1.2.1
sudo pip install tensorflow-gpu==1.2.1
提示成功后我们通过以下命名验证tensorflow能不能正常运行
[beer@localhost ~]$ python
Python 2.7.5 (default, Aug 4 2017, 00:39:18)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.__version__
'1.2.1'
>>>
可能会出现各种错误,主要是因为如果含有GPU版的Tensorflow需要Cudnn库进行加速,库的版本出现问题,我们这里使用的是v5.1版本对应的cuda 8.0是没有问题的。至此安装Tensorflow的过程已经结束了,如果有问题可以通过博客联系我,我们可以基于Tensorflow跑自己的模型,当然跑模型的过程也会出现各种错误,需要一步一步解决安装过程中出现的错误,比如某个模块不存在,我们一般通过
sudo pip install 库名
解决,如果找不到库名去百度一下就有很多类似的问题了。