看了很多安装办法,总是不能一次性安装成功,第一次写自己的安装过程,因为没有事先准备,所以安装过程中的截图就只能东拼西凑,尽量把过程写详细点。
用的是公司的服务器,我看了很多办法都是在ubuntu系统下完成的,很多安装语法都没法使用,我现在还一知半解的,这台服务器环境大概的如下:
GPU:GeForce GTX 1060
gcc—4.8.5
Centos7
查看nvidia GPU版本
nvidia-smi
查看nvidia 驱动版本
cat /proc/driver/nvidia/version
最开始是为了安装tensorflow-gpu版本加快程序运行速度,最后发现原本的cuda10.1版本太高,并不兼容,所以就有了接下来漫长的卸载安装卸载安装的过程。
cudnn中的cu是CUDA的简写,dnn是deep neural network的简写
首先确认当前cuda版本
cat /usr/local/cuda/version.txt
进入nvidia developer网站下cuda9.0安装网址
安装图中深绿色窗口选择,这些选择都是对应了linux系统版本情况,可以根据自身CentOS版本选择7或者6,选择完之后,安装包会自动生成
点击下载按钮,会生成下载界面,如果普通下载或者迅雷下载会时不时就中断,所以我选择在系统下下载。
下载窗口
这里不使用下载或者迅雷下载,而是点击复制链接地址,使用如下命令
wget 复制过的链接地址
回车运行就不会出现中断下载失败啦
下载好之后会出现名字为如下一长串看起来不好操作
cuda_9.0.176_384.81_linux.run?Bgpc221fK3Z4SjoITU3M_aEWpid1xJ11hx0W33TqIuk1WkzFIkMSO3mDSwyVuLPVtXZQIg7B1iRqHcPPp7Vp-G4IBhfThG3RbSVdwCLs1cfJEcFwebxAyrCDYbOh1w8KKau4FJv5K5DB_1RuWnsUSiuvQo7M6L27A7BIQIQVaiC5FpN2tGEf2cQi
可以使用如下命令行修改文件ming
mv 原文件名 新文件名(cuda_9.0.176_384.81_linux.run)
下载完成之后,给文件赋予执行权限:
chmod +x cuda_9.0.176_384.81_linux.run
执行安装包,开始安装:
sh cuda_9.0.176_384.81_linux.run
开始安装之后,需要阅读说明,建议使用空格键或者enter键慢慢阅读。然后进行配置
除了第一个提示界面输入accept之外其他都是y/n
唯一选择n的是,其余都是y
这里选择不要安装驱动,因为已经安装最新的驱动了,否则可能会安装旧版本的显卡驱动,导致重复登录的情况)
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.48?
(y)es/(n)o/(q)uit: n
安装完成之后,可以配置他们的环境变量,在vim ~/.bashrc的最后加上以下配置信息:
export PATH=/usr/local/cuda-9.0/bin
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64\
最后使用命令source ~/.bashrc使它生效
source ~/.bashrc
可以使用命令nvcc -V查看安装的cuda版本信息:
[root@192-168-1-110 ~]# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
执行以下几条命令:
[root@192-168-1-110 ~]# cd /usr/local/cuda-9.0/samples/1_Utilities/deviceQuery
[root@192-168-1-110 deviceQuery]# make
make: Nothing to be done for `all'.
[root@192-168-1-110 deviceQuery]# ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1060 6GB"
CUDA Driver Version / Runtime Version 10.1 / 9.0
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 6078 MBytes (6373179392 bytes)
(10) Multiprocessors, (128) CUDA Cores/MP: 1280 CUDA Cores
GPU Max Clock rate: 1759 MHz (1.76 GHz)
Memory Clock rate: 4004 Mhz
Memory Bus Width: 192-bit
L2 Cache Size: 1572864 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 66 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS
[root@192-168-1-110 deviceQuery]#
点击,Download,出现系统选择界面,选择Linux版本,按照cuda安装流程,选择复制链接,使用wget语句下载
[root@192-168-1-110 ~]# wget https://developer.download.nvidia.cn/compute/machine-learning/cudnn/secure/v7.5.0.56/prod/9.0_20190219/cudnn-9.0-linux-x64-v7.5.0.56.tgz?YTyT4U_7EZChyVe17Zqc9aRGOmu7BQu5RtfhhdE0kkun2WJAVobzl26sqNim_4RM8LGU9KjPdcCegk6rRGxfa-qsVV_N8i1hKqn1pamYUUod9qyOSDp90Wx_qwfAX1NdzPGUBzfkXghIin1HsFJ2cQVr3ZbUkT6yQVN6paRihNQJjtjOXH-yJ7gUT1hGvVlkrGvk_SNGeewRL96z_cPX7gVa
下载之后是一个压缩包,将复杂的名字简短化,如下:
cudnn-9.0-linux-x64-v7.5.0.56.tgz
然后对它进行解压,命令如下:
tar -zxvf cudnn-9.0-linux-x64-v7.5.0.56.tgz
解压之后可以得到以下文件:
cuda/include/cudnn.h
cuda/NVIDIA_SLA_cuDNN_Support.txt
cuda/lib64/libcudnn.so
cuda/lib64/libcudnn.so.7
cuda/lib64/libcudnn.so.7.4.2
cuda/lib64/libcudnn_static.a
使用以下两条命令复制这些文件到CUDA目录下:
cp cuda/lib64/* /usr/local/cuda-9.0/lib64/
cp cuda/include/* /usr/local/cuda-9.0/include/
可以使用以下命令查看CUDNN的版本信息:
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
到这里就算安装完成了CUDNN和CUDA了接下来按照版本安装对应tensorflow-GPU版本
接下来对应版本安装tensorflow-GPU版本
根据系统版本,安装1.5.0版本
如果直接使用pip install tensorflow-gpu会下载最新的1.13.0版本,结果发现不兼容,比如import tensorflow会出现下面的错误提示
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/tensorflow/__init__.py", line 24, in
from tensorflow.python import *
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/__init__.py", line 49, in
from tensorflow.python import pywrap_tensorflow
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 72, in
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in
_pywrap_tensorflow_internal = swig_import_helper()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: libcudnn.so.6(5/7/8/9): cannot open shared object file: No such file or directory
Failed to load the native TensorFlow runtime.
See https://www.tensorflow.org/install/install_sources#common_installation_problems
for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.
所以指定版本安装更为稳妥,指定用==符号
pip install tensorflow-gpu==1.5.0
这样就算安装成功啦
[root@192-168-1-110 lib64]# python3
Python 3.6.1 (default, Mar 24 2019, 11:53:13)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
>>>
这些都安装完成之后还会有一系列的问题,比如显卡溢出,GPU的内存太小,很多代码都跑不通等等等等,这就得训练自己的编程能力啦,我要去学习了。
这里转载了几篇博客的内容,也是通过他们我成功的安装完成了tensorflow-gpu。
Ubuntu安装和卸载CUDA和CUDNN
查看 CUDA cudnn 版本 查看Navicat GPU版本 查看nvidia 驱动版本
Centos7 安装NVDIA GPU+TensorFlow-gpu1.5.0