linux下cuda、cudnn、tensorRT的安装

目录

  • 一、所用脚本文件:
  • 二、linux下cuda安装
  • 三、cudnn安装
    • 官网方法:
    • 参考链接方法:
  • 四、tensorrt安装
  • 保存退出
  • 五、常见问题
    • 1. 验证tensorrt是否安装成功时,import tensorrt出现 "ImportError: libnvinfer.so.7: cannot open shared object file: No such file or directory"
    • 2. source ~/.bashrc 时出现 " command not found: shopt "
    • 3. 编译tensorrtx时,cmake .. 时提示OpenCV static library was compiled with CUDA 10.1 support. Please, use the same version or rebuild OpenCV with CUDA 10.0
    • 4. 编译tensorrtx,make时出现NvInfer.h: No such file or directory
    • 5. sudo ./yolov5 -s yolov5s.wts yolov5s.engine s时出现./yolov5: error while loading shared libraries: libcublas.so.10.0: cannot open shared object file: No such file or directory
    • 6. ./yolov5: error while loading shared libraries: libnvrtc.so.10.0: cannot open shared object file: No such file or directory
    • 7. sudo ./yolov5 -s yolov5s.wts yolov5s.engine s时出现abort sudo ./yolov5 -s yolov5s.wts yolov5s.engine s

cuda各版本链接:https://developer.nvidia.com/cuda-toolkit-archive
cudnn各版本链接:https://developer.nvidia.com/rdp/cudnn-archive

一、所用脚本文件:

# 网上教程 https://blog.csdn.net/linxinloningg/article/details/122525742

# Install cuda10.0 by runfile
wget https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux
sudo sh cuda_10.0.130_410.48_linux
echo export PATH=$PATH:/usr/local/cuda-10.0/bin >> ~/.zshrc
echo export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.0/lib64 >> ~/.zshrc
echo export CUDA_HOME=$CUDA_HOME:/usr/local/cuda-10.0 >> ~/.zshrc
source ~/.zshrc
rm -rf /usr/local/cuda-10.1

# Install cudnn7.6.5 by tar
wget https://minio.cvmart.net/user-file/24466/0151ae98cd7b430ebad6108f4501cd7f.tgz
tar -xvf 0151ae98cd7b430ebad6108f4501cd7f.tgz
sudo cp cuda/include/cudnn*.h /usr/local/cuda-10.0/include/ 
sudo cp cuda/lib64/libcudnn* /usr/local/cuda-10.0/lib64/
sudo chmod a+r /usr/local/cuda-10.0/include/cudnn*.h /usr/local/cuda-10.0/lib64/libcudnn*

# Install opencv3.4.16
wget https://minio.cvmart.net/user-file/24466/31c66879e16d4ebb95585a4591cca760.zip
unzip 31c66879e16d4ebb95585a4591cca760.zip
cd opencv-3.4.16
mkdir build
cd build
cmake -D CMAKE_BUILD_TYPE=Release -D CMAKE_INSTALL_PREFIX=/usr/local  ..
make -j8
sudo make install

# Install tensorrt7.0.0.11
wget https://minio.cvmart.net/user-file/24466/97ef25d5664842aeb06a7f5226851348.gz
tar -xvf 97ef25d5664842aeb06a7f5226851348.gz
echo "export LD_LIBRARY_PATH=/project/train/src_repo/TensorRT-7.0.0.11/lib:/project/train/src_repo/TensorRT-7.0.0.11/targets/x86_64-linux-gnu/lib:$LD_LIBRARY_PATH"  >> ~/.zshrc
source ~/.zshrc
cd TensorRT-7.0.0.11/python 
pip install tensorrt-7.0.0.11-cp36-none-linux_x86_64.whl 
cd ../../TensorRT-7.0.0.11/uff 
pip install uff-0.6.5-py2.py3-none-any.whl
cd ../graphsurgeon 
pip install graphsurgeon-0.4.1-py2.py3-none-any.whl

# Run tensorrtx

# Generate .wts from pytorch with .pt,
cd /project/train/src_repo
git clone -b v6.0 https://github.com/ultralytics/yolov5.git
git clone https://github.com/wang-xinyu/tensorrtx.git
cd /project/train/src_repo/yolov5
wget https://github.com/ultralytics/yolov5/releases/download/v6.0/yolov5s.pt
pip install -r requirements.txt
cp /project/train/src_repo/tensorrtx/yolov5/gen_wts.py /project/train/src_repo/yolov5
python gen_wts.py -w yolov5s.pt -o yolov5s.wts

#build tensorrtx/yolov5 and run
cd /project/train/src_repo/tensorrtx/yolov5
mkdir build
cd /project/train/src_repo/tensorrtx/yolov5/build
cp /project/train/src_repo/yolov5/yolov5s.wts /project/train/src_repo/tensorrtx/yolov5/build/
# 如果使用自己的数据集,在cmake .. 之前一定要修改yololayer.h里的CLASS_NUM
cmake ..
make
sudo ./yolov5 -s yolov5s.wts yolov5s.engine s

二、linux下cuda安装

相关参考链接:
https://blog.csdn.net/qq_44961869/article/details/115954258
https://blog.csdn.net/kingfoulin/article/details/98872965
传统上,安装 NVIDIA Driver 和 CUDA Toolkit 的步骤是分开的,但实际上我们可以直接安装 CUDA Toolkit,系统将自动安装与其版本匹配的 NVIDIA Driver
cuda10.2下载官网,选择runfile的安装方式。指令如下:
wget https://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_linux.run
sudo sh cuda_10.2.89_440.33.01_linux.run
第二条指令会跳出相应安装选项,如果系统已经安装了nvidia driver了一定要选择不安装driver,否则会安装失败。安装完成后会跳出如下页面。
linux下cuda、cudnn、tensorRT的安装_第1张图片
linux下cuda、cudnn、tensorRT的安装_第2张图片

这表示 NVIDIA Driver 和 CUDA Toolkit 已安装完毕。后半段安装信息提示我们修改环境变量 PATH 和 LD_LIBRARY_PATH。
在 ~/.bashrc 文件中写入,cuda-10.2可以写成cuda,后续可以直接通过修改cuda软连接来选择不同版本cuda

export PATH=$PATH:/usr/local/cuda-10.2/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.2/lib64
export CUDA_HOME=$CUDA_HOME:/usr/local/cuda-10.2

完成cuda环境变量的配置。
设置完环境变量后用nvcc -V可以看出cuda安装成功。
在这里插入图片描述
总结:我刚开始尝试了cuda的deb安装方式,很遗憾失败了,附上链接:
linux下cuda、cudnn、tensorRT的安装_第3张图片

linux下cuda、cudnn、tensorRT的安装_第4张图片
https://github.com/wang-xinyu/tensorrtx/blob/master/tutorials/install.md

三、cudnn安装

官方教程:https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html
参考链接:https://blog.csdn.net/kingfoulin/article/details/98872965

官网方法:

linux下cuda、cudnn、tensorRT的安装_第5张图片

参考链接方法:

linux下cuda、cudnn、tensorRT的安装_第6张图片
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
综合两种方法,先下载tar包,再用tar解压文件后会得到cuda文件夹

tar -xvf 0151ae98cd7b430ebad6108f4501cd7f.tgz
sudo cp cuda/include/cudnn*.h /usr/local/cuda/include/ 
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*

注意8.x后的cudnn版本不在cudnn.h里。而是在cudnn_version.h里,用cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2查看

四、tensorrt安装

参考链接:
https://blog.csdn.net/linxinloningg/article/details/122525742
https://blog.csdn.net/zong596568821xp/article/details/86077553
在编译之前,如果是采用tar压缩包安装的话一定要修改tensorrtx/yolov5目录下的CMakeLists.txt:

# 将下面两行
include_directories(/usr/include/x86_64-linux-gnu/)
link_directories(/usr/lib/x86_64-linux-gnu/)
# 替换成
include_directories(/project/train/src_repo/TensorRT-8.4.0.6/include/)
link_directories(/project/train/src_repo/TensorRT-8.4.0.6/lib/)

保存退出

五、常见问题

1. 验证tensorrt是否安装成功时,import tensorrt出现 “ImportError: libnvinfer.so.7: cannot open shared object file: No such file or directory”

原因: 没有将tensorrt的相关库加入系统环境变量。
解决方法:
在~/.bashrc中添加环境变量:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:your trt path/TensorRT-7.2.1.6/lib
然后source ~/.bashrc使其生效。

2. source ~/.bashrc 时出现 " command not found: shopt "

参考链接: https://stackoverflow.com/questions/26616003/shopt-command-not-found-in-bashrc-after-shell-updation
原因: 这是因为linux默认使用的shell是zsh
解决方法: 将所需配置修改写入~/.zshrc,执行 source ~/.zshrc

3. 编译tensorrtx时,cmake … 时提示OpenCV static library was compiled with CUDA 10.1 support. Please, use the same version or rebuild OpenCV with CUDA 10.0

linux下cuda、cudnn、tensorRT的安装_第7张图片

原因: 使用的opencv编译时使用的cuda版本和现在要使用的cuda版本不一致
解决方法: 根据错误提示进入/usr/local/lib/cmake/opencv4/OpenCVConfig.cmake文件中发现要判断cuda版本是否是10.1,这也就意味着现在使用的opencv并不是自己刚开始用cuda10编译的opencv,从文件中也可以看到opencv的版本和安装的不一样,所以再怎么重新编译opencv也没用,我的解决方法是修改这个文件,将判断是否为cuda10.1修改成cuda10.0就不会报错了,后面也没什么影响。
解决方法已更新:
参考链接: https://blog.csdn.net/u012816621/article/details/51732932
因为在cmake的CMakeLists.txt文件中有默认使用opencv路径,如果想要cmake时使用自己的opencv版本,则需要在find_package这句话前面去设置opencv的OpenCV文件夹在哪里,添加set(OpenCV_DIR /project/train/src_repo/TensorRT-7.0.0.11/graphsurgeon/opencv-3.4.16/build)这句话。因为我的opencv用make install安装在/project/train/src_repo/TensorRT-7.0.0.11/graphsurgeon/opencv-3.4.16/build文件夹下。这个可以根据你install的位置去变化。
另一个解决方法,还未经验证。

4. 编译tensorrtx,make时出现NvInfer.h: No such file or directory

参考链接:
https://blog.csdn.net/CCCrunner/article/details/122979419
https://blog.csdn.net/sinat_28442665/article/details/118797477

原因: C++编译时找不到此头文件
解决方法: 修改CMakeLists.txt 中tensorRT中的路径,将原本的内容

# tensorrt
include_directories(/usr/include/x86_64-linux-gnu/)
link_directories(/usr/lib/x86_64-linux-gnu/)

修改成

# tensorrt
include_directories(/project/train/src_repo/TensorRT-7.0.0.11/include)
link_directories(/project/train/src_repo/TensorRT-7.0.0.11/lib)

5. sudo ./yolov5 -s yolov5s.wts yolov5s.engine s时出现./yolov5: error while loading shared libraries: libcublas.so.10.0: cannot open shared object file: No such file or directory

解决方法如下

6. ./yolov5: error while loading shared libraries: libnvrtc.so.10.0: cannot open shared object file: No such file or directory

原因: 类似这种都是因为库文件没有加入到系统的环境变量中
解决方法: 将库文件所在路径加入到系统环境变量中,如果还是失败。可以先cd 回到根目录,再find -name libmyelin.so. 查找这种库文件的位置,然后手动sudo cp /usr/local/cuda-10.0/lib64/libcudart.so.10.0 /usr/local/lib/libcudart.so.10.0 && sudo ldconfig就可以了
参考链接:
https://blog.csdn.net/martinkeith/article/details/102997059

https://positive.blog.csdn.net/article/details/118804501?spm=1001.2101.3001.6650.6&utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromBaidu%7ERate-6.queryctrv4&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromBaidu%7ERate-6.queryctrv4&utm_relevant_index=13

附上所有缺的库的解决方法:

sudo cp /usr/local/cuda-10.0/lib64/libcublas.so.10.0  /usr/local/lib/libcublas.so.10.0 && sudo ldconfigsudo 
cp /usr/local/cuda-10.0/lib64/libcudart.so.10.0 /usr/local/lib/libcudart.so.10.0 && sudo ldconfig
cp /project/train/src_repo/TensorRT-7.0.0.11/targets/x86_64-linux-gnu/lib/libnvinfer.so.7 /usr/local/lib/libnvinfer.so.7 && sudo ldconfig
cp /project/train/src_repo/TensorRT-7.0.0.11/targets/x86_64-linux-gnu/lib/libmyelin.so.1 /usr/local/lib/libmyelin.so.1 && sudo ldconfig
cp /usr/local/cuda-10.0/lib64/libnvrtc.so.10.0 /usr/local/lib/libnvrtc.so.10.0 && sudo ldconfig
#或者cp -r /project/train/src_repo/TensorRT-7.0.0.11/targets/x86_64-linux-gnu/lib/* /usr/local/lib && sudo ldconfig

7. sudo ./yolov5 -s yolov5s.wts yolov5s.engine s时出现abort sudo ./yolov5 -s yolov5s.wts yolov5s.engine s

参考链接:https://github.com/wang-xinyu/tensorrtx/issues/899
原因: 应该是tensorrtx和.pt的版本不对应问题
解决方法: 重新下载匹配的版本
最终结果:
linux下cuda、cudnn、tensorRT的安装_第8张图片

你可能感兴趣的:(深度学习学习笔记,linux,深度学习)