首先检查系统是否有支持 CUDA 编程的 GPU。可使用
lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] (rev a1)
01:00.1 Audio device: NVIDIA Corporation TU102 High Definition Audio Controller (rev a1)
01:00.2 USB controller: NVIDIA Corporation TU102 USB 3.1 Host Controller (rev a1)
01:00.3 Serial bus controller: NVIDIA Corporation TU102 USB Type-C UCSI Controller (rev a1)
03:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] (rev a1)
03:00.1 Audio device: NVIDIA Corporation TU102 High Definition Audio Controller (rev a1)
03:00.2 USB controller: NVIDIA Corporation TU102 USB 3.1 Host Controller (rev a1)
03:00.3 Serial bus controller: NVIDIA Corporation TU102 USB Type-C UCSI Controller (rev a1)
第一个命令: nvidia-smi
找不到命令 “nvidia-smi”,但可以通过以下软件包安装它:
sudo apt install nvidia-utils-510-server # version 510.47.03-0ubuntu3, or
sudo apt install nvidia-utils-390 # version 390.157-0ubuntu0.22.04.1
sudo apt install nvidia-utils-418-server # version 418.226.00-0ubuntu5~0.22.04.1
sudo apt install nvidia-utils-450-server # version 450.236.01-0ubuntu0.22.04.1
sudo apt install nvidia-utils-470 # version 470.182.03-0ubuntu0.22.04.1
sudo apt install nvidia-utils-470-server # version 470.182.03-0ubuntu0.22.04.1
sudo apt install nvidia-utils-510 # version 510.108.03-0ubuntu0.22.04.1
sudo apt install nvidia-utils-515 # version 515.105.01-0ubuntu0.22.04.1
sudo apt install nvidia-utils-515-server # version 515.105.01-0ubuntu0.22.04.1
sudo apt install nvidia-utils-525 # version 525.105.17-0ubuntu0.22.04.1
sudo apt install nvidia-utils-525-server # version 525.105.17-0ubuntu0.22.04.1
这个提示是第一种方法,第二种方法是:
点击“附加驱动”,选择对应版本的驱动。然后点击应用更改,等待安装即可。安装完成后重启。
再次运行nvidia-smi,就会看到:
Tue Apr 11 11:04:02 2023
±----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=++==============|
| 0 NVIDIA GeForce … Off | 00000000:01:00.0 On | N/A |
| 41% 43C P8 6W / 250W | 351MiB / 11264MiB | 1% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 1 NVIDIA GeForce … Off | 00000000:03:00.0 Off | N/A |
| 30% 37C P8 1W / 250W | 6MiB / 11264MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2096 G /usr/lib/xorg/Xorg 84MiB |
| 0 N/A N/A 2311 G /usr/bin/gnome-shell 119MiB |
| 0 N/A N/A 2997 G …6/usr/lib/firefox/firefox 145MiB |
| 1 N/A N/A 2096 G /usr/lib/xorg/Xorg 4MiB |
±----------------------------------------------------------------------------+
注意,以上步骤,只是安装了显卡驱动。
接下来安装cuda:假如我们需要安装cuda11.7
通过下面的地址下载安装包:
https://developer.nvidia.com/cuda-11-7-0-download-archive
选择del(local)后,运行以下命令。
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda-repo-ubuntu2204-11-7-local_11.7.0-515.43.04-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-11-7-local_11.7.0-515.43.04-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-11-7-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda
如果报错:
下列软件包有未满足的依赖关系:
nvidia-kernel-common-525 : 冲突: nvidia-kernel-common
nvidia-kernel-common-530 : 冲突: nvidia-kernel-common
E: 错误,pkgProblemResolver::Resolve 发生故障,这可能是有软件包被要求保持现状的缘故。
此时可以采用以下命令行:
sudo aptitude -y install cuda
若未找到该三方库需要进行三方库的安装:
sudo apt-get install aptitude
此时,执行nvidia-smi,会出现如下输出:
Tue Apr 11 12:51:58 2023
±----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=++==============|
| 0 NVIDIA GeForce … Off | 00000000:01:00.0 On | N/A |
| 27% 36C P8 3W / 250W | 331MiB / 11264MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 1 NVIDIA GeForce … Off | 00000000:03:00.0 Off | N/A |
| 27% 30C P8 1W / 250W | 6MiB / 11264MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2096 G /usr/lib/xorg/Xorg 135MiB |
| 0 N/A N/A 2311 G /usr/bin/gnome-shell 42MiB |
| 0 N/A N/A 2997 G …6/usr/lib/firefox/firefox 149MiB |
| 1 N/A N/A 2096 G /usr/lib/xorg/Xorg 4MiB |
±----------------------------------------------------------------------------+
第二个命令:nvcc --version
找不到命令 “nvcc”,但可以通过以下软件包安装它:
sudo apt install nvidia-cuda-toolkit
然后,再次执行nvcc --version:
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
此时,再次执行nvidia-smi,还是会出现如下输出:
Tue Apr 11 12:51:58 2023
±----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=++==============|
| 0 NVIDIA GeForce … Off | 00000000:01:00.0 On | N/A |
| 27% 36C P8 3W / 250W | 331MiB / 11264MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 1 NVIDIA GeForce … Off | 00000000:03:00.0 Off | N/A |
| 27% 30C P8 1W / 250W | 6MiB / 11264MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2096 G /usr/lib/xorg/Xorg 135MiB |
| 0 N/A N/A 2311 G /usr/bin/gnome-shell 42MiB |
| 0 N/A N/A 2997 G …6/usr/lib/firefox/firefox 149MiB |
| 1 N/A N/A 2096 G /usr/lib/xorg/Xorg 4MiB |
±----------------------------------------------------------------------------+
第三个命令:dpkg -l | grep cuda
ii cuda-repo-ubuntu2204-11-7-local 11.7.0-515.43.04-1 amd64 cuda repository configuration files
ii libcudart11.0:amd64 11.5.117~11.5.1-1ubuntu1 amd64 NVIDIA CUDA Runtime Library
ii nvidia-cuda-dev:amd64 11.5.1-1ubuntu1 amd64 NVIDIA CUDA development files
ii nvidia-cuda-gdb 11.5.114~11.5.1-1ubuntu1 amd64 NVIDIA CUDA Debugger (GDB)
ii nvidia-cuda-toolkit 11.5.1-1ubuntu1 amd64 NVIDIA CUDA development toolkit
ii nvidia-cuda-toolkit-doc 11.5.1-1ubuntu1 all NVIDIA CUDA and OpenCL documentation
传统上,安装 NVIDIA Driver 和 CUDA Toolkit 的步骤是分开的,但实际上我们可以直接安装 CUDA Toolkit,系统将自动安装与其版本匹配的 NVIDIA Driver。下面我们讲述安装 CUDA Toolkit 的方法。
在安装 CUDA Toolkit 前,要确保系统安装了 gcc 和 make。如果希望使用 C++ 进行 CUDA 编程,需要安装 g++。如果想要运行 CUDA 例程序,需要安装相应的依赖库。
sudo apt update # 更新 apt
sudo apt install gcc g++ make # 安装 gcc g++ make
sudo apt install libglu1-mesa libxi-dev libxmu-dev libglu1-mesa-dev freeglut3-dev # 安装依赖库
在 CUDA Toolkit 的下载页面选择系统版本和安装方式,下载并运行 runfile:
wget https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda_11.7.0_515.43.04_linux.runsudo sh cuda_11.7.0_515.43.04_linux.run
sudo sh cuda_11.7.0_515.43.04_linux.run
wget https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda_11.7.0_515.43.04_linux.runsudo sh
–2023-04-11 13:25:22-- https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda_11.7.0_515.43.04_linux.runsudo
正在解析主机 developer.download.nvidia.com (developer.download.nvidia.com)… 失败:域名解析暂时失败。
wget: 无法解析主机地址 ‘developer.download.nvidia.com’
–2023-04-11 13:25:52-- http://sh/
正在解析主机 sh (sh)… 失败:域名解析暂时失败。
wget: 无法解析主机地址 ‘sh’
此时,可以手动到这个链接https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda_11.7.0_515.43.04_linux.runsudo sh进行下载。
再运行 sudo sh cuda_11.7.0_515.43.04_linux.run
输入密码后就会看到下列信息:
│ Existing package manager installation of the driver found. It is strongly │
│ recommended that you remove this before continuing. │
│ Abort │
│ Continue
选择Continue、accept,就会出现如下提示:
│ CUDA Installer │
│ - [X] Driver │
│ [X] 515.43.04 │
│ + [X] CUDA Toolkit 11.7 │
│ [X] CUDA Demo Suite 11.7 │
│ [X] CUDA Documentation 11.7 │
│ Options │
│ Install
选择install,进行安装。
如果提示:Installation failed. See log at /var/log/cuda-installer.log for details.那么,就重新安装,注意取消 Driver 的安装:
CUDA Installer │
│ - [ ] Driver │
│ [ ] 515.43.04 │
│ + [X] CUDA Toolkit 11.7 │
│ [X] CUDA Demo Suite 11.7 │
│ [X] CUDA Documentation 11.7 │
│ Options │
│ Install
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-11.7/
Please make sure that
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.7/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 515.00 is required for CUDA 11.7 functionality to work.
To install the driver using this installer, run the following command, replacing with the name of this run file:
sudo .run --silent --driver
Logfile is /var/log/cuda-installer.log
此时,就会在/usr/local/目录下,出现cuda-11.7这个文件夹。
输出的后半段安装信息,提示我们修改环境变量 PATH 和 LD_LIBRARY_PATH. 在 ~/.bashrc 文件中写入路径信息。
vim ~/.bashrc
进入编辑模式:
export PATH=“/usr/local/cuda-11.7/bin: P A T H " e x p o r t L D L I B R A R Y P A T H = " / u s r / l o c a l / c u d a − 11.7 / l i b 64 : PATH" export LD_LIBRARY_PATH="/usr/local/cuda-11.7/lib64: PATH"exportLDLIBRARYPATH="/usr/local/cuda−11.7/lib64:LD_LIBRARY_PATH”
按esc,输入:wq! 保存成功!
即可完成 CUDA 的配置。
(注意:tec/skel和home目录下,可能都存在.bashrc文件,我们修改的.bashrc文件是home目录下的!)
此时,打开一个新的终端,输入 nvcc -V,进行验证,就会看到如下信息,说明cuda配置成功。
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0
安装cuDNN
cuDNN比CUDA安装简单,下载对应版本压缩包,拷贝文件到指定目录,给予权限就好了。
严格来讲cuDNN不能叫安装。它其实是对CUDA的一些补充,所以“安装”过程很简单。去英伟达官网下载对应CUDA 11.7的cuDNN压缩包(这一步可能需要注册英伟达账号)。
官网下载地址:https://developer.nvidia.com/cudnn
使用如下命令进行解压:tar -xf cudnn-linux-x86_64-8.8.1.3_cuda11-archive.tar.xz
解压之后将cudnn目录下面有include和lib64两个子目录,将这两个目录下面的所有文件拷贝到CUDA 11.7安装路径对应的目录下面即可。
特别注意:拷贝的路径一定是上面指定的环境变量路径!
sudo cp /home/uvtec/下载/cudnn-linux-x86_64-8.8.1.3_cuda11-archive/include/cudnn.h /usr/local/cuda-11.7/include
sudo cp /home/uvtec/下载/cudnn-linux-x86_64-8.8.1.3_cuda11-archive/lib/libcudnn* /usr/local/cuda-11.7/lib64
sudo chmod a+r /usr/local/cuda-11.7/include/cudnn.h
sudo chmod a+r /usr/local/cuda-11.7/lib64/libcudnn*
sudo cp /home/uvtec/下载/cudnn-linux-x86_64-8.8.1.3_cuda11-archive/include/cudnn* /usr/local/cuda-11.7/include
sudo chmod a+r /usr/local/cuda-11.7/include/cudnn*
查看cuDNN版本方法:
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
注意,这句话可能执行了没效果,那是因为新版本换位置了,需要用:
cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
这里再次注意路径问题。
至此CUDN + cuDNN安装完成。