作者:冯拓
电脑配置如下:
配置 | HP-Z820 |
---|---|
CPU核心线程数和主频 | intel xeon(至强) E-5 2620 2.0GHz*24 |
内存 | 64GB |
硬盘 | 2TB |
显卡 | NIVDIA TITAN X 12GB |
安装过程中使用的安装包:
安装包 | |
---|---|
驱动 | NVIDIA-Linux-x86_64-396.18.run |
cuda | cuda_9.1.85_387.26_linux.run |
cudnn | cudnn-9.1-linux-x64-v7.1.tgz |
Anaconda | Anaconda3-5.2.0-Linux-x86_64.sh |
bazel | bazel-0.14.1-installer-linux-x86_64.sh |
tensorflow源代码 | tensorflow-r1.8 |
cudnn | cudnn-9.1-linux-x64-v7.1.tgz |
cuda下载链接:
https://developer.nvidia.com/cuda-downloads
anaconda下载链接:
https://www.continuum.io/downloads
bazel下载链接:
https://github.com/bazelbuild/bazel/releases
tensorflow源代码下载链接:
https://github.com/tensorflow/tensorflow
本文分为两个大部分来介绍,首先介绍cuda与cudnn的安装,然后介绍使用源码安装tensorflow。
一、安装cuda与cudnn
1、Nvidia驱动安装-run文件安装
下载完名称为NVIDIA-Linux-x86_64-396.18.run 的文件。使用以下命令:
sudo gedit /etc/modprobe.d/blacklist.conf
加入以下语句,将nouveau禁止命令写入文件。
blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off
调用指令禁止nouveau。
echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf
建立新的内核并重启。
sudo update-initramfs -u
sudo reboot
使用ctrl + alt+ F1进入文本模式,关闭x server。
sudo service lightdm stop
sudo init 3
切换NVIDIA安装包指定目录,赋予权限并进行安装
chmod +x NVIDIA-Linux-x86_64-396.18.run
sudo sh NVIDIA-Linux-x86_64-396.18.run --no-opengl-files
返回图形界面
sudo service lightdm start
检查驱动是否安装成功,出现图片所示,驱动就安装完成了。
nvidia-smi
2、安装cuda
安装的CUDA 可以到官网去下载,我在此安装cuda_9.1.85_387.26_linux.run。执行以下命令:
sudo chmod +x cuda_9.1.85_387.26_linux.run
sudo ./cuda_9.1.85_387.26_linux.run
按q结束cuda的描述,然后输入accept,在提示是否安装NVIDIA驱动,选择N。后面的其他提示都选择默认或者Y,如下所示:
Do you accept the previously read EULA?
accept/decline/quit: accept
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 387.26?
(y)es/(n)o/(q)uit: n
Install the CUDA 9.1 Toolkit?
(y)es/(n)o/(q)uit: y
Enter Toolkit Location
[ default is /usr/local/cuda-9.1 ]:
Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y
Install the CUDA 9.1 Samples?
(y)es/(n)o/(q)uit: y
Enter CUDA Samples Location
[ default is /home/xxxxx]:
然后使用以下命令:
sudo gedit /etc/profile
修改系统环境变量,在文件末尾加入以下内容。
export PATH=$PATH:/usr/local/cuda-9.1/bin
export LD_LIBRARY_PATH=/usr/local/cuda-9.1/lib64:/lib
3、安装cudnn
cuDNN同样需要我们去NVIDIA的官网下载适合cuda版本的deb文件或tgz文件,我以cudnn-9.1-linux-x64-v7.1.tgz的安装为例。进入压缩包所在的目录分别执行以下命令:
tar -xzvf cudnn-9.1-linux-x64-v7.1.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
4、anaconda的安装
到官网下载安装包,我使用的是 Python 3.6 版本的Anaconda3-5.2.0-Linux-x86_64.sh。在终端进入安装包所在目录,执行命令:
bash Anaconda3-5.2.0-Linux-x86_64.sh
在安装过程中在选择加入环境变量时,选择yes。
安装完成后,重新打开一个终端,输入命令。
创建虚拟环境,以备后续安装tensorflow。
conda create -n tensorflow
二、安装tensorflow
下面介绍通过Bazel编译tensorflow源码的方式在虚拟环境安装tensorflow。
5、安装Bazel
到github下载bazel-0.14.1-installer-linux-x86_64.sh
sudo apt-get install pkg-config zip g++ zlib1g-dev unzip python
进入安装包目录,执行命令:
chmod +x bazel-0.14.1-installer-linux-x86_64.sh
./bazel-0.14.1-installer-linux-x86_64.sh --user
将export PATH="$PATH:$HOME/bin"添加到~/.bashrc中。
6、编译安装tensorflow
到github下载源码,branch处可以选择版本为r1.8。解压源码,从终端进入解压文件目录,执行以下命令:
./configure
按照提示输入y/n或者路径等信息。安装过程中cuda选择9.1版本,cudnn选择7.1版本。
xxx@xxx:~/Downloads/tensorflow-r1.8$ ./configure
You have bazel 0.14.1 installed.
Please specify the location of python. [Default is /home/yangyuting/anaconda3/bin/python]:
Found possible Python library paths:/home/yangyuting/anaconda3/lib/python3.6/site-packages
Please input the desired Python library path to use. Default is [/home/yangyuting/anaconda3/lib/python3.6/site-packages]
Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: n
No jemalloc as malloc support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
No Google Cloud Platform support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
No Amazon S3 File System support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: n
No Apache Kafka Platform support will be enabled for TensorFlow.
Do you wish to build TensorFlow with XLA JIT support? [y/N]: n
No XLA JIT support will be enabled for TensorFlow.
Do you wish to build TensorFlow with GDR support? [y/N]: n
No GDR support will be enabled for TensorFlow.
Do you wish to build TensorFlow with VERBS support? [y/N]: n
No VERBS support will be enabled for TensorFlow.
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 9.1
Please specify the location where CUDA 9.2 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7.1
Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Do you wish to build TensorFlow with TensorRT support? [y/N]: n
No TensorRT support will be enabled for TensorFlow.
Please specify the NCCL version you want to use. [Leave empty to default to NCCL 1.3]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 5.2]
Do you want to use clang as CUDA compiler? [y/N]: n
nvcc will be used as CUDA compiler.
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
--config=mkl # Build with MKL support.
--config=monolithic # Config for mostly static monolithic build.
Configuration finished
配置完成后依次执行以下命令,安装tensorflow
bazel build --config=opt --config=cuda --config=monolithic //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
source activate tensorflow
pip install /tmp/tensorflow_pkg/tensorflow-1.8.0-cp36-cp36m-linux_x86_64.whl
7、运行例程
我按照以上步骤安装好TensorFlow之后,运行例程,出现了以下问题:
ImportError: libcublas.so.9.1: cannot open shared object file: No such file or directory
这个问题与编译文件无关,其原因在于安装cuda9.1的时候有一些配置文件没有正确进行配置,也就是一些文件找不到。找不到并不是意味着不存在,而是没有通过正确的路径来查找。依次执行以下命令(软连接):
sudo ln -s /usr/local/cuda-9.1/lib64/libcublas.so.9.1 /usr/lib/libcublas.so.9.1
sudo ln -s /usr/local/cuda-9.1/lib64/libcusolver.so.9.1 /usr/lib/libcusolver.so.9.1
sudo ln -s /usr/local/cuda-9.1/lib64/libcudart.so.9.1 /usr/lib/libcudart.so.9.1
sudo ln -s /usr/local/cuda-9.1/lib64/libcudnn.so.7 /usr/lib/libcudnn.so.7
sudo ln -s /usr/local/cuda-9.1/lib64/libcufft.so.9.1 /usr/lib/libcufft.so.9.1
sudo ln -s /usr/local/cuda-9.1/lib64/libcurand.so.9.1 /usr/lib/libcurand.so.9.1
将相应的文件和你的cuda路径进行软连接,这是默认安装路径,如果你的路径不一样,需要修改上面的代码!软连接后就可以正常import TensorFlow了。cuda,cudnn正常工作。
参考链接:
https://blog.csdn.net/stories_untold/article/details/78521925
https://blog.csdn.net/lhx_998/article/details/76135936
https://blog.csdn.net/caojunwei0324/article/details/78962223
https://blog.csdn.net/shuzfan/article/details/78516542
https://cloud.tencent.com/developer/article/1150020
https://docs.nvidia.com/deeplearning/sdk/cudnn-install/
https://docs.nvidia.com/deeplearning/sdk/cudnn-install/