环境 Rocky Linux 8.5,在官网下载镜像制作启动盘进行系统安装即可。(承接上一篇文章:磁盘挂载与gcc9.3安装 cat /etc/redhat-release)
目录
一、NVIDIA460.84驱动安装
1、禁用nouveau驱动
2、安装显卡驱动
二、CUDA安装
1、安装cuda_11.2.0
2、安装cudnn8.1.1.33
三、参考
输入以下 命令进行查看,应该是有回显出现的。如果没有回显出现,那么你可以省略此步骤。
lsmod | grep nouveau
在/etc/modprobe.d/blacklist.conf 中添加nouveau 到黑名单。
vim /etc/modprobe.d/blacklist.conf
在里面添加:
blacklist nouveau
options nouveau modeset=0
保存退出
dracut --force //Linux更新内核
或者备份并更新内核
//重新建立initramfs image文件(生成新的内核,这个内核在开机的时候不会加载nouveau驱动程序)
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
mv /boot/initramfs-$(uname -r).img.bak /home/initramfs-$(uname -r).img.bak
dracut /boot/initramfs-$(uname -r).img $(uname -r)
//切换到命令行界面
systemctl set-default multi-user.target
//切回图像界面(systemctl用法:开机启动服务 systemctl enable ***.service)
//systemctl set-default graphical.target
修改后需要重启系统。确认下Nouveau是已经被你干掉,使用命令: lsmod | grep nouveau
查看显卡型号
lshw -c video
yum install epel-release #安装epel源
yum -y install gcc kernel-devel dkms
//yum -y install gcc kernel-devel "kernel-devel-uname-r == $(uname -r)" dkms
yum install libglvnd-devel.x86_64
已安装:
dkms-3.0.3-1.el8.noarch
elfutils-libelf-devel-0.185-1.el8.x86_64
kernel-devel-4.18.0-348.20.1.el8_5.x86_64
安装显卡驱动 :
./NVIDIA-Linux-x86_64-460.84.run
1. Would you like to register the kernel module souces with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later? 【No】
2. Nvidia’s 32-bit compatibility libraries? 【No】
3. Would you like to run the nvidia-xconfigutility to automatically update your x configuration so that the NVIDIA x driver will be used when you restart x? Any pre-existing x confile will be backed up. 【Yes】
验证是否安装成功
nvidia-smi
Mon Apr 18 15:19:50 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.84 Driver Version: 460.84 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 106... Off | 00000000:03:00.0 Off | N/A |
| 28% 34C P0 28W / 120W | 0MiB / 6078MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 106... Off | 00000000:82:00.0 Off | N/A |
| 39% 39C P0 30W / 120W | 0MiB / 6078MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
./cuda_11.2.0_460.27.04_linux.run
系统安装时,是UEFI模式启动的,则在BIOS中需禁用Security BOOT选项。
踩坑之神:安装失败,查看失败原因
cat /var/log/cuda-installer.log
[INFO]: Driver not installed.
[INFO]: Checking compiler version...
[INFO]: gcc location: /usr/local/bin/gcc
[INFO]: gcc version: gcc 版本 9.3.0 (GCC)
[INFO]: Initializing menu
[INFO]: Setup complete
[INFO]: Components to install:
[INFO]: Driver
[INFO]: 460.27.04
[INFO]: Executing NVIDIA-Linux-x86_64-460.27.04.run --ui=none --no-questions --accept-license --disable-nouveau --no-cc-version-check --install-libglvnd 2>&1
[INFO]: Finished with code: 256
[ERROR]: Install of driver component failed.
[ERROR]: Install of 460.27.04 failed, quitting
cat /var/log/nvidia-installer.log
ERROR: The nvidia-drm kernel module was not created.
ERROR: The nvidia-drm kernel module failed to build. This kernel module is required for the proper operation of DRM-KMS. If you do not need to use DRM-KMS, you can try to install this driver package again with the '--no-drm' option.
ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
uname -r
ll /usr/src/kernels/
如查询的结果不一致,则解决办法,升级内核
yum -y update
问题解决:安装时去掉Driver选项,因为刚才已经单独安装过了。
./cuda_11.2.0_460.27.04_linux.run
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-11.2/
Samples: Installed in /home/hhs-face/, but missing recommended libraries
Please make sure that
- PATH includes /usr/local/cuda-11.2/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-11.2/lib64, or, add /usr/local/cuda-11.2/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.2/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 460.00 is required for CUDA 11.2 functionality to work.
To install the driver using this installer, run the following command, replacing with the name of this run file:
sudo .run --silent --driver
配置环境变量
vim /etc/profile
//在末尾添加:
export PATH=/usr/local/cuda-11.2/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}
//保存退出,立即生效
source /etc/profile
重启、查看版本,验证安装成功
reboot
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Nov_30_19:08:53_PST_2020
Cuda compilation tools, release 11.2, V11.2.67
Build cuda_11.2.r11.2/compiler.29373293_0
chmod +x cudnn-11.2-linux-x64-v8.1.1.33.tgz
tar -zxvf cudnn-11.2-linux-x64-v8.1.1.33.tgz
//将解压后得到的文件夹,分别复制到cuda安装路径下与cuda的bin ,include 和lib文件夹合并。
cp cuda/include/cudnn.h /usr/local/cuda-11.2/include
cp cuda/include/cudnn_version.h /usr/local/cuda-11.2/include
cp cuda/lib64/libcudnn* /usr/local/cuda-11.2/lib64
chmod a+r /usr/local/cuda-11.2/include/cudnn.h /usr/local/cuda-11.2/lib64/libcudnn*
//检验证CUDNN是否安装成功, 检查CUDNN版本,这里的版本是8.1.1。
cat /usr/local/cuda-11.2/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 8
#define CUDNN_MINOR 1
#define CUDNN_PATCHLEVEL 1
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
CUDA和CUDNN的关系