欢迎访问我的GitHub
https://github.com/zq2599/blog_demos
内容:所有原创文章分类汇总及配套源码,涉及Java、Docker、Kubernetes、DevOPS等;
本篇概览
- 自己有一台2015年的联想笔记本,显卡是GTX950M,已安装ubuntu 16.04 LTS桌面版,为了使用其GPU完成deeplearning4j的训练工作,自己动手安装了CUDA和cuDNN,在此将整个过程记录下来,以备将来参考,整个安装过程分为以下几步:
- 准备工作
- 安装Nvidia驱动
- 安装CUDA
- 安装cuDNN
特别问题说明
- 按照一般步骤,在安装完Nvidia显卡驱动后,会提示对应的CUDA版本,接下来按照提示的版本安装CUDA,例如我这里提示的是11.2,正常情况下,我应该安装11.2版本的CUDA
- 但是我选择9.1版本就行安装,因为之前的开发中发现deeplearning4j使用了11.2的SDK后,启动应用会有ClassNotFound的错误,此问题至今未修复(惭愧,欣宸水平如此之低...),因此,我在Nvidia驱动提示11.2版本的情况下,依然安装了9.1版本,后来在此环境运行deeplearning4j应用一切正常
- 如果您没有我这类问题,完全可以按照驱动指定的版本来安装CUDA,具体的操作步骤稍后会详细说到;
准备工作
- 接下来的操作,除了在网页下载,其余都是ssh远程连接到ubuntu机器操作的,ssh登录的帐号为普通帐号,并非root
- 如果已有驱动,请先删除:
sudo apt-get remove --purge nvidia*
- 禁用nouveau驱动(很重要),用vi打开文件/etc/modprobe.d/blacklist.conf,在尾部增加以下内容,然后保存退出:
blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off
- 关闭nouveau:
echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf
- 更新initramfs:
update-initramfs -u
- 执行reboot重启电脑
- 重启后,执行以下命令,应该不会有任何输出,证明nouveau已经禁用:
lsmod|grep nouveau
- 获取Kernel source:
sudo apt-get install linux-source
- 安装过程中显示信息如下图:
- 根据上图红框中的信息,可知内核版本号为,于是执行以下命令:
sudo apt-get install linux-headers-4.4.0-210-generic
下载和安装Nvidia驱动
- 访问Nvidia网站,地址https://www.nvidia.cn/Downloa...,然后选择对应的显卡和操作系统,我的选择如下图所示:
- 点击上图搜索按钮后,进入下图页面,点击下载:
- 下载得到的文件名为NVIDIA-Linux-x86_64-460.84.run
- 关闭图形页面:
sudo service lightdm stop
- 给驱动文件增加可执行权限:
sudo chmod a+x NVIDIA-Linux-x86_64-460.84.run
- 开始安装:
sudo ./NVIDIA-Linux-x86_64-460.84.run -no-x-check -no-nouveau-check -no-opengl-files
- 遇到下图,选择红框:
sudo service lightdm start
- 执行命令nvidia-smi,如果驱动安装成功,会显示以下内容:
will@lenovo:~/temp/202106/20$ nvidia-smi
Sun Jun 20 09:02:11 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.84 Driver Version: 460.84 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 950M Off | 00000000:01:00.0 Off | N/A |
| N/A 41C P0 N/A / N/A | 0MiB / 4046MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
- 从上述内容可见CUDA Version: 11.2表示该驱动对应的CUDA版本应该是11.2,正如前面所说,我这边遇到了问题,因此接下来会安装9.1版本,但是您可以选择安装11.2
安装CUDA
- 浏览器访问https://developer.nvidia.com/...,点击红框中的链接:
- 如下图,下载Linux版本:
- 继续选择x86_64:
- 选择具体的Linux版本及其版本号:
- 要下载的东西不少,一个安装程序和三个补丁:
- 上述四个文件的下载地址整理如下:
https://developer.download.nvidia.cn/compute/cuda/9.1/secure/Prod/local_installers/cuda_9.1.85_387.26_linux.run?P0Ntu_6NLtuuEMm6fJRk1W5vl4KM7oaT1oFW870zKJ-zDw2ckKntFLOE6klRJfw2CmTa8z3Q390_6urlgc6LqjoqlIFW9gvfvDCusnINYplLaw1u8lRY8R4oVNtpNzaXU4BQcHjvdb6c6rjq20dktCcRd4640woXt1yHmD95v1Du7wdBBXq2eOY
https://developer.download.nvidia.cn/compute/cuda/9.1/secure/Prod/patches/1/cuda_9.1.85.1_linux.run?yeXf_7wIGlHAUw--E_YVLQZRgXv0x2i043woJVY-ydXU5Kyhc-eYQf5JmL-4mvYmlvPYCEc5RhT2sDWscX20CJbdOwpkt30kWb9vx8E4oIlajDQ3MVPvXdiKKsIOBUx-h0q0N0jSkNn80VMhW-nk8jwvRY_e6MuFzqWBaPk
https://developer.download.nvidia.cn/compute/cuda/9.1/secure/Prod/patches/2/cuda_9.1.85.2_linux.run?5jGZxNigaOJkaaPbMagjhSW7ebQvYGyYoqe2vBxZ1eV8qp2BzXJLxIPgAo11UgWhORirQkdJGq5b8eFh4aShBVUTmuPaasvRiMCKDZw5yjjIobGQrCEyU-LFO59AbrRER57Mxa0T1Sc97fC80IOZq8Ox2repjn7A3oYVgd8
https://developer.download.nvidia.cn/compute/cuda/9.1/secure/Prod/patches/3/cuda_9.1.85.3_linux.run?CxWimJTC-XROYihig-UZmH62odbJInf1fmxTZ_bsW1nQ0Zz5cL5r8qLmlMR_1j2rVhk3j8Z5lS6dpArt8frjGHH2MeVn5TefMoclam8udm-RSMMmqHXYE66hHN2D0drVEdtCwe8ZrEIYb2rpucaz9svCFE8Z319mge4Ju94
- 下载完毕后,执行命令chmod a+x *.run为上述四个文件增加可执行权限
- 安装CUDA:
sudo sh cuda_9.1.85_387.26_linux.run
- 遇到license时,像是用vi工具那样,输入":",再输入"q"回车,就能跳过license阅读,执行真正的安装操作了:
- 接下来是一系列提问,每一个提问的回答如下图,千万注意红框中的问题一定要选择n:
- 安装完成后输出以下内容:
Installing the CUDA Toolkit in /usr/local/cuda-9.1 ...
Missing recommended library: libGLU.so
Missing recommended library: libX11.so
Missing recommended library: libXi.so
Missing recommended library: libXmu.so
Missing recommended library: libGL.so
Installing the CUDA Samples in /home/will ...
Copying samples to /home/will/NVIDIA_CUDA-9.1_Samples now...
Finished copying samples.
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-9.1
Samples: Installed in /home/will, but missing recommended libraries
Please make sure that
- PATH includes /usr/local/cuda-9.1/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-9.1/lib64, or, add /usr/local/cuda-9.1/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.1/bin
Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-9.1/doc/pdf for detailed information on setting up CUDA.
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 384.00 is required for CUDA 9.1 functionality to work.
To install the driver using this installer, run the following command, replacing with the name of this run file:
sudo .run -silent -driver
Logfile is /tmp/cuda_install_13425.log
- 打开文件~/.bashrc,在尾部增加以下两行(LD_LIBRARY_PATH如果已经存在,请参考PATH的写法改成追加):
export PATH=/usr/local/cuda-9.1/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-9.1/lib64
- 执行命令source ~/.bashrc使配置生效
- 执行命令su -切换到root帐号,执行以下命令(不要用sudo,而是切到root帐号):
sudo echo "/usr/local/cuda-9.1/lib64" >> /etc/ld.so.conf
- 再以root身份执行以下命令:
ldconfig
- 执行命令exit退出root身份,现在又是普通帐号的身份了
- 执行命令nvcc -V检查CUDA版本,注意参数V是大写:
will@lenovo:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85
- 安装第一个补丁:
sudo sh cuda_9.1.85.1_linux.run
- 安装第二个补丁:
sudo sh cuda_9.1.85_387.26_linux.run
- 安装第三个补丁:
sudo sh cuda_9.1.85_387.26_linux.run
安装cuDNN
- 按提示登录,如果没有帐号请注册一个,登录后进入下载页面,需要点击下图红框位置才有能见到老版本:
- 选择与CUDA匹配的版本:
- 下载后解压,得到文件夹cuda,然后执行以下命令:
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
- 执行检查确认的命令cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2,如果安装顺利会有以下输出:
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 1
#define CUDNN_PATCHLEVEL 3
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
- 至此,Ubuntu16安装CUDA(9.1)和cuDNN已经完成了,希望能给您一些参考。
你不孤单,欣宸原创一路相伴
欢迎关注公众号:程序员欣宸
微信搜索「程序员欣宸」,我是欣宸,期待与您一同畅游Java世界...
https://github.com/zq2599/blog_demos