使用
lspci | grep -i nvidia
列出所有支持的GPU
安装内核开发依赖包:
yum install kernel-devel
查看内核版本号,用来看与开发包版本号是否一致:
uname -r
查看nvida显卡驱动:
cat /proc/driver/nvidia/version
得出:
NVRM version: NVIDIA UNIX x86_64 Kernel Module 510.68.02 Wed Apr 20 21:10:34 UTC 2022
GCC version: gcc version 9.3.1 20200408 (Red Hat 9.3.1-2) (GCC)
但是NVIDIA驱动与CUDA存在区别:
CUDA是NVIDIA推出的用于自家GPU的并行计算框架,也就是说CUDA只能在NVIDIA的GPU上运行,而且只有当要解决的计算问题是可以大量并行计算的时候才能发挥CUDA的作用。CUDA的本质是一个工具包(ToolKit)。
CUDA ToolKit的安装:
查看系统属性
uname -m && cat /etc/*release
查看nouveau是否启用,如果启用也应关闭
lsmod | grep nouveau
禁用方法:
touch /etc/modprobe.d/blacklist-nouveau.conf
在blacklist-nouveau.conf中写入:
blacklist nouveau
options nouveau modeset=0
重新生成内核的启动镜像 initramfs 文件包:
sudo dracut --force
由于已安装nvida驱动,故使用如下命令:
find -name nvidia-smi
这里的11.6 指的是可驱动的最高版本。下载 CUDA 的版本应低于11.6
官网搜索,下载CUDA的11.6安装包
https://developer.nvidia.com/cuda-11-6-0-download-archive
wget https://developer.download.nvidia.com/compute/cuda/11.6.0/local_installers/cuda_11.6.0_510.39.01_linux.run
sudo sh cuda_11.6.0_510.39.01_linux.run
由于宿主机已有nvida驱动,故不再重复安装,选择install。
特别注意安装后的信息:
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-11.6/
Please make sure that
- PATH includes /usr/local/cuda-11.6/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-11.6/lib64, or, add /usr/local/cuda-11.6/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.6/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 510.00 is required for CUDA 11.6 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run --silent --driver
加入到:
export PATH=/usr/local/cuda-11.6/bin:$PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.6/lib64
只要把cuDNN文件复制到CUDA的对应文件夹里就可以,即是所谓插入式设计,把cuDNN数据库添加CUDA里,cuDNN是CUDA的扩展计算库,不会对CUDA造成其他影响。(cuDNN的安装过程实际上是把cuDNN的头文件复制到CUDA的头文件目录里面去;把cuDNN的库复制到CUDA的库目录里面去。)
下载路径:
https://developer.nvidia.com/rdp/cudnn-archive#a-collapse811-111
tar -xvf /root/cudaToolKit/cudnn-linux-x86_64-8.6.0.163_cuda11-archive.tar.xz
将安装目录复制过去:
# 复制cudnn头文件
sudo cp /root/cudaToolKit/cudnn-linux-x86_64-8.6.0.163_cuda11-archive/include/* /usr/local/cuda-11.6/include/
# 复制cudnn的库
sudo cp /root/cudaToolKit/cudnn-linux-x86_64-8.6.0.163_cuda11-archive/lib/* /usr/local/cuda-11.6/lib64/
# 添加可执行权限
sudo chmod +x /usr/local/cuda-11.6/include/cudnn.h
sudo chmod +x /usr/local/cuda-11.6/lib64/libcudnn*
输入此命令检查安装是否成功:
cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
3.安装tensorRT8
下载地址:
https://developer.nvidia.com/nvidia-tensorrt-8x-download
使用wget的下载方式是:
右键复制下载链接,并在wget添加命令参数,表示接受协议,最终下载内容为:
wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" https://developer.download.nvidia.cn/compute/machine-learning/tensorrt/secure/8.5.1/tars/TensorRT-8.5.1.7.Linux.x86_64-gnu.cuda-11.8.cudnn8.6.tar.gz?_zCOrsELfKMlBKtgCVxzU4PbVXUOkaAE74UcV9Yzar-gQ0s8Tb4qAdKebPQSpE2xHxloxi4REGmH_0-s5kEsBF9DPzIl-a9BY0DhqxP2hMIiqonMLYN4oL0fR_EgomfznX8OvnNc5gV7YFgtvaA
Redirecting output to ‘wget-log.1’.
查看下载进度方式为
tail -f wget-log.1
下载完后解压:
tar -xvf TensorRT-8.5.1.7.Linux.x86_64-gnu.cuda-11.8.cudnn8.6.tar.gz?_zCOrsELfKMlBKtgCVxzU4PbVXUOkaAE74UcV9Yzar-gQ0s8Tb4qAdKebPQSpE2xHxloxi4REGmH_0-s5kEsBF9DPzIl-a9BY0DhqxP2hMIiqonMLYN4oL0fR_EgomfznX8OvnNc5gV7YFgtvaAQKRSiztO4cHyf57QaOdTSckrG6rgH
安装完成,其目录为:
/usr/local/tensorRt8Target/TensorRT-8.5.1.7