两个搭配 cuda9.0+tensorflow1.8+cuDNN7.0.5
和cuda8.0+tensorflow1.4+cuDNN6.0
查看当前gpu型号:
lspci | grep -I vga
查看当前机器环境,到底是ubantu还是centos
lsb_release -a
查看gcc版本号
gcc --version
确定已经安装了kernel header
sudo apt-get install linux-headers-$(uname -r)
禁用nouveau
lsmod | grep nouveau
若有内容输出:
sudo vi /etc/modprobe.d/blacklist-nouveau.conf
添加
blacklist nouveau
options nouveau modeset=0
sudo reboot
重新安装.run(安装时请留意,在提示是否安装OpenGL时,应该选no)
编译 NVIDIA_CUDA-9.1_Samples
cd NVIDIA_CUDA-9.1_Samples
make
安装cudnn7
下载对应版本的cudnn7
tar -xzvf cudnn-9.1-linux-x64-v7.1.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h
查看cuddn
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
查看cuda
cat /usr/local/cuda/version.txt
cuda卸载:
sudo /usr/local/cuda-9.1/bin/uninstall_cuda_9.1.pl
sudo rm -rf /usr/local/cuda-9.1/
卸载后 新版本安装不要再安装驱动
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.48?
(y)es/(n)o/(q)uit: n
安装tensorflow-gpu
pip install tensorflow-gpu==1.8.0
查看nvidia信息
nvidia-smi
安装cuda问题;
异常:
It appears that an X server is running. Please exit X before installation. If you're sure that X is not running, but are getting this error, please delete any X lock files in /tmp.
异常原因:X server锁的影响,至于什么是X server锁 目前不知道 按照提示删除就好了
处理方式:
sudo init 3
rm -rf /tmp/.X*
异常:
The driver installation is unable to locate the kernel source. Please make sure that the kernel source packages are installed and set up correctly.
If you know that the kernel source packages are installed and set up correctly, you may pass the location of the kernel source with the '--kernel-source-path' flag.
处理方式:版本问题,你的内核和cuda版本不一致,换一个版本吧,我换了cuda的9版本,记住不要用9.1,9.2 否则tensorflow的版本就是下一个坑
异常:
nvcc: command not found
原因:未加入到环境变量中
export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
异常:
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
异常出现的原因:tensorlow安装时版本问题 如果不是9.0版本的cuda 那就卸载重装吧
还有可能是上面的原因,需要将上面的两个path加入到环境变量中