参考
- https://tensorflow.google.cn/install/install_linux
- http://nvidia.com/cuda
- http://developer.nvidia.com/cudnn
说明
- 前提是机器上必须有Nvidia显卡,不太老就好(古董也没必要玩这个了吧,费电),在Nvidia官网可以查到显卡支持情况
https://developer.nvidia.com/cuda-gpus - 安装过程中的命令都需要root身份,请使用su root切换或者每次加 sudo,编译运行测试代码使用普通用户就好
踩坑后的提示,怪我眼瞎坑自己,[手动抽脸表情]
- 必须按tensorflow 官网提示的版本安装 1.9 对应 CUDA 9.0,CUDA 9.0 要下载相应版本的cuDNN
- 如果喜欢折腾,建议使用没有重要数据的硬盘
- 安装包最好下载到其他电脑上,使用scp拷贝到安装机上,重装了几遍ubuntu,下一次包就2个G,作为联通40G所谓无线流量卡用户,想着还是蛋疼
下载主要安装文件
- CUDA® 工具包
#http://nvidia.com/cuda
#我选的是16.04的run文件,其他的坑不敢踩了
cuda_9.0.176_384.81_linux.run
- cuDNN 深度神经网络(DNN)开发环境,需要网站注册
#http://developer.nvidia.com/cudnn
libcudnn7-dev_7.1.4.18-1+cuda9.0_amd64.deb
libcudnn7_7.1.4.18-1+cuda9.0_amd64.deb
libcudnn7-doc_7.1.4.18-1+cuda9.0_amd64.deb
准备环境
看CUDA自带的驱动版本,这里是384.81,低于这个版本就要先卸载,>= 跳过
#建议run文件卸载,即你之前下载的Nvidia驱动run文件
chmod +x *.run
./NVIDIA-Linux-x86_64-384.59.run --uninstall
# 不建议采取这种,不知道为什么没尝试过
apt-get remove --purge nvidia*
禁用自带的nouveau驱动,如果你连Nvidia驱动都装过了,这一步也免了
vi /etc/modprobe.d/blacklist.conf
#加两行
blacklist nouveau
options nouveau modeset=0
#生效配置
update-initramfs -u
#重启,后分辨率变低了,毕竟没有显卡驱动了
reboot
#检查是否生效
lsmod | grep nouveau
#如果屏幕没有输出则禁用nouveau成功
安装必要的编译环境否者自带网卡驱动安装不上
apt install gcc g++ make make-guile
针对CUDA 9.0,必须将GCC降级为gcc5,也是安装CUDA时发现的
apt install gcc-5 g++-5
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 50
update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-5 50
安装 CUDA® 工具包
一定要根据tensorflow版本安装对应版本的CUDA 1.9对应9.0,被自己眼瞎害的
chmod +x cuda_9.0.176_384.81_linux.run
sh ./cuda_9.0.176_384.81_linux.run
#会有说明,需要看的自己看,看了几页不想看/条款看不懂的 按q键
- 如果安装过程中提示失败,根据提示查看log排错
- 安装成功后的log
Do you accept the previously read EULA?
accept/decline/quit: accept
You are attempting to install on an unsupported configuration. Do you wish to continue?
(y)es/(n)o [ default is no ]: y
#这里384.81表示显卡驱动版本,如果本机安装的显卡驱动版本比它高就不需要安装
#选no主要是前面踩坑的时候安了CUDA9.2,呵呵
#正常应该是yes
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?
(y)es/(n)o/(q)uit: n
Install the CUDA 9.0 Toolkit?
(y)es/(n)o/(q)uit: y
Enter Toolkit Location
[ default is /usr/local/cuda-9.0 ]:
Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y
Install the CUDA 9.0 Samples?
(y)es/(n)o/(q)uit: y
Enter CUDA Samples Location
[ default is /root ]:
Installing the CUDA Toolkit in /usr/local/cuda-9.0 ...
Missing recommended library: libGLU.so
Missing recommended library: libX11.so
Missing recommended library: libXi.so
Missing recommended library: libXmu.so
Missing recommended library: libGL.so
Installing the CUDA Samples in /root ...
Copying samples to /root/NVIDIA_CUDA-9.0_Samples now...
Finished copying samples.
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-9.0
Samples: Installed in /root, but missing recommended libraries
Please make sure that
- PATH includes /usr/local/cuda-9.0/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-9.0/lib64, or, add /usr/local/cuda-9.0/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.0/bin
Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-9.0/doc/pdf for detailed information on setting up CUDA.
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 384.00 is required for CUDA 9.0 functionality to work.
To install the driver using this installer, run the following command, replacing with the name of this run file:
sudo .run -silent -driver
Logfile is /tmp/cuda_install_7657.log
/root/NVIDIA_CUDA-9.0_Samples
设置环境变量
vi /etc/ld.so.conf.d/cuda.conf
#写入两行
/usr/local/cuda/lib64
/usr/local/cuda/extras/CUPTI/lib64
vi /etc/profile
#加入两行
export CUDA_HOME=/usr/local/cuda/bin
export PATH=$PATH:$CUDA_HOME
重启 reboot
测试安装情况
- 没有报错就表示安装成功
cd /root/NVIDIA_CUDA-9.0_Samples/samples/1_Utilities/deviceQuery
make
./deviceQuery
# Result = PASS 成功
cd ../bandwidthTest
make
./bandwidthTest
#Result = PASS 成功
cuDNN 安装
NVIDIA cuDNN is a GPU-accelerated library of primitives for deep neural networks.
#cuDNN v7.1.4 Runtime Library for Ubuntu16.04 (Deb)
dpkg -i libcudnn7_7.1.4.18-1+cuda9.0_amd64.deb
#cuDNN v7.1.4 Developer Library for Ubuntu16.04 (Deb)
dpkg -i libcudnn7-dev_7.1.4.18-1+cuda9.0_amd64.deb
#cuDNN v7.1.4 Code Samples and User Guide for Ubuntu16.04 (Deb)
libcudnn7-doc_7.1.4.18-1+cuda9.0_amd64.deb
# 锁定版本,免得自动更新破坏环境
apt-mark hold libcudnn7 libcudnn7-dev
测试
#Copy the cuDNN sample to a writable path.
$cp -r /usr/src/cudnn_samples_v7/ $HOME
#Go to the writable path.
$ cd $HOME/cudnn_samples_v7/mnistCUDNN
#Compile the mnistCUDNN sample.
$make clean && make
#Run the mnistCUDNN sample.
$ ./mnistCUDNN
#If cuDNN is properly installed and running on your Linux system, you will see a message similar to the following:
#Test passed!
安装 tensorflow-gpu 以python3为例
sudo apt-get install python3-pip python3-dev
pip3 install tensorflow-gpu
测试安装
#测试代码,保存到比如test.py
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))
#执行 python3 test.py
#第一次有点慢
#没报错,有显卡信息,b'Hello, TensorFlow!',表示成功