这篇文章主要依据两篇文章:
深度学习主机环境配置: Ubuntu16.04+Nvidia GTX 1080+CUDA8.0
深度学习主机环境配置: Ubuntu16.04+GeForce GTX 1080+TensorFlow
不过在实际运行的过程中,有一定的不同之处,随着时间的推移,一些组件已经可以更方便的安装,不再需要自己编译了。一些流程也有所更改。因此我在这里把自己在ubuntu16.04下安装cuda,cudnn和gpu版tensorflow的流程写下来,供人参考。
首先安装显卡驱动。首先看自己显卡
lspci | grep -i vga
lspci | grep -i nvidia
然后看显卡驱动
lsmod | grep -i nvidia
在ubuntu16.04中,更换驱动非常方便,去
系统设置->软件更新->附加驱动->切换到最新的NVIDIA驱动即可。应用更改->重启
再运行nvidia-smi
来看看
去https://developer.nvidia.com/cuda-downloads,根据自己的系统下载对应安装包,下载的是runfile文件,而不是deb文件。下载好以后,运行
sudo ./cuda_8.0.44_linux.run --tmpdir=/tmp
开始安装。后面参数 tmpdir好像是为了放日志,具体是什么日志忘了=。= 安装开始以后,首先是一个协议,一直按回车到底以后,输入accept。其他的操作如下所示
Do you accept the previously read EULA?
accept/decline/quit: accept
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 367.48?
(y)es/(n)o/(q)uit: n
Install the CUDA 8.0 Toolkit?
(y)es/(n)o/(q)uit: y
Enter Toolkit Location
[ default is /usr/local/cuda-8.0 ]:
Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y
Install the CUDA 8.0 Samples?
(y)es/(n)o/(q)uit: y
下面是安装时的输出信息:
Installing the CUDA Toolkit in /usr/local/cuda-8.0 …
Installing the CUDA Samples in /home/textminer …
Copying samples to /home/textminer/NVIDIA_CUDA-8.0_Samples now…
Finished copying samples.
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-8.0
Samples: Installed in /home/textminer
Please make sure that
– PATH includes /usr/local/cuda-8.0/bin
– LD_LIBRARY_PATH includes /usr/local/cuda-8.0/lib64, or, add /usr/local/cuda-8.0/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-8.0/bin
Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-8.0/doc/pdf for detailed information on setting up CUDA.
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 361.00 is required for CUDA 8.0 functionality to work.
To install the driver using this installer, run the following command, replacing with the name of this run file:
sudo .run -silent -driver
Logfile is /tmp/cuda_install_6583.log
安装完毕后,再声明一下环境变量,并将其写入到 ~/.bashrc 的尾部:
export CUDA_HOME=/usr/local/cuda
export PATH=$PATH:$CUDA_HOME/bin
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
保存退出,运行source ~/.bashrc
测试是否安装成功
// 如果怕把samples搞坏了那就先搞一个备份,在备份里搞
cd /usr/local/cuda/samples/1_Utilities/deviceQuery
sudo make
./deviceQuery
结果如下
之前有看到说要把gcc改成4.9的。不过我现在用5.4的也可以编译,就是有几个警告,不知道会不会有什么大的影响。
如果要使用gpu来对tensorflow进行加速,除了安装CUDA以外,cuDNN也是必须要安装的。跟cuda一样,去nvidia的官网下载cuDNN的安装包。不过这次没法直接下载,需要先注册,然后还要做个调查问卷什么的,稍微有点麻烦。我下的是cuDNN v5.1 Library for Linux
这个版本。不要下cuDNN v5.1 Developer Library for Ubuntu16.04 Power8 (Deb)
这个版本,因为是给powe8处理器用的,不是amd64.
下载下来以后,发现是一个tgz的压缩包,使用tar进行解压
tar -xvf cudnn-8.0-linux-x64-v5.1.tgz
安装cuDNN比较简单,解压后把相应的文件拷贝到对应的CUDA目录下即可
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
我之前已经安装了cpu-only版的tensorflow,所以现在要先把原先的tf卸载
sudo pip uninstall tensorflow
之后从https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.12.0rc1-cp35-cp35m-linux_x86_64.whl下载gpu版的tf
sudo pip install tensorflow_gpu-0.12.0rc1-cp35-cp35m-linux_x86_64.whl
来测试一下
$ipython
import tensorflow as tf
结果如下
没有报错,说明已经安装成功了。
之前在命令行中已经调用tensorflow成功了,没想到在idea中写程序时调用tensorflow又出现了错误:
ImportError: libcudart.so.8.0: cannot open shared object file: No such file or directory
Error importing tensorflow. Unless you are using bazel,
you should not try to import tensorflow from its source directory;
please exit the tensorflow source tree, and relaunch your python interpreter
from there.
但是我并没有在tf的根目录下运行呀?而且同一个文件用命令行是可以运行的呀?为什么换idea来就不行了捏?
后来经过尝试发现,问题还是出在环境变量LD_LIBRARY_PATH中。光把环境变量写在~/.bashrc中是不行的,还需要写在/etc/profile下。因此,可以在/etc/profile的结尾处加上
## cuda
export CUDA_HOME=/usr/local/cuda
export PATH=$PATH:$CUDA_HOME/bin
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
登出后重新登陆,再打开idea,就可以在idea中调用tf啦