我最近在两台RHEL8.0的服务器装了这些玩意,特此记录一下
1.离线安装nvidia driver,cuda10.1,cudnn7.5
关键因素:显卡型号(Quadro P4000);系统(RHEL 8.0) 用’cat /etc/redhat-releas’查看;gcc(8.2.1) gcc -v
##########分隔线############
(1)cuda选择/下载:
百度搜索(最方便的方式)想要的cuda版本(9.0/9.2/10.0/10.1);
我选的是cuda10.1,因为cuda9.0/cuda9.2/cuda10.0没有RHEL8的包,但是RHEL7的包也有可能适用(后来验证了,确实可以用但会出现某些问题,所以个人不推荐);
参照这目录并找到最低版本driver:添加链接描述l
(2)driver下载/安装:
优先选择https://www.nvidia.cn/Download/Find.aspx?lang=cn
https://www.nvidia.com/Download/index.aspx?lang=en-us
第二个网址一般会下载最新的驱动,它对于cuda和cudnn版本都是向下兼容的,但尽量不要下载最新的
禁用nouveau:
sudo vim /usr/lib/modprobe.d/dist-blacklist.conf
blacklist nouveau
options nouveau modeset=0
镜像更替:
sudo mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r)-nouveau.img
sudo dracut /boot/initramfs-$(uname -r).img $(uname -r)
重启机器!并检查'lsmod | grep nouveau', 没打印出内容就ok
执行脚本:
关掉X Server:(不然在安装时会报错说没exit X)
sudo init 3
sudo rm /tmp/.X*
sudo systemctl stop gdm.service
chmod u+x NVIDIA-Linux-x86_64-418.88.run
sudo ./NVIDIA-Linux-x86_64-418.88.run --kernel-source-path=/usr/src/kernels/4.18.0-80.el8.x86_64(ls看一下选对自己的内核版本号)
安装时我遇到了错误:
打开vim /var/log/nvidia-installer.log,会发现最后几行有错误信息,最关键的是Makefile:958: *** "Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel". Stop
.
解决方法:(这步最好联网)
Ubuntu:
apt install libelf-dev
apt install libssl-dev
CentOS/RHEL:
yum install elfutils-libelf-devel
安装选项(原则上没见过选项最好都选择安装):
The distribution-provided pre-install script failed! Are you sure you want to continue?
选择 continue
Nvidia's 32-bit libraries?
这个一定选择 YES,很多软件需要32位库支持
Would you like to run the nvidia-xconfigutility to automatically update your x configuration so that the NVIDIA x driver will be used when you restart x? Any pre-existing x confile will be backed up.
选择 YES
安装验证:
nvidia-smi
(3)cuda安装:
由于nvidia官方只有cuda10.1支持RHEL8,首选是cuda_cluster_pkgs_10.1.243_418.87.00_rhel8.tar.gz这个包
这个没有国内镜像,所以建议用‘科学上网’的方式,不然下得很慢(其实也很难下下来),最后用的网址是
http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.243_418.87.00_linux.run
找到这个包的路径,然后
sudo sh cuda_10.1.243_418.87.00_linux.run
这个版本的安装流程和以前旧版本的不一样,我提及几处重要的选项:
Do you accept the above EULA? (accept/decline/quit):accept
x CUDA Installer
x - [] Driver (敲空格去掉X)
x [] 418.87.00
x + [X] CUDA Toolkit 10.1
x [X] CUDA Samples 10.1
x [X] CUDA Demo Suite 10.1
x [X] CUDA Documentation 10.1
x Options (Enter键,会出现下面这个表)
x Install
x CUDA Driver
x [X] Do not install any of the OpenGL-related driver files
x [ ] Do not install the nvidia-drm kernel module
x [X] Update the system X config file to use the NVIDIA X driver
x Change directory containing the kernel source files
x Done
选完这个后’Done’回去’Install’
最后配置环境变量,‘sudo vim /etc/profile’(有界面的可以sudo gedit /etc/profile)
在最后一行添加
export PATH=/usr/local/cuda-10.1/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64$LD_LIBRARY_PATH
保存并退出
(4)cudnn下载/安装:
cudnn一定要对应的cuda;cudnn不需要选择平台;尽量选择最新的版本;
tar xvf cudnn-10.1-linux-x64-v7.5.0.56.tgz
sudo cp cuda/include/* /usr/local/cuda/include(其实可以不要这行,个人习惯,保险起见)
sudo cp cuda/include/* /usr/local/cuda-10.1/include
sudo cp cuda/lib64/* /usr/local/cuda/lib64(其实可以不要这行,个人习惯,保险起见)
sudo cp cuda/lib64/* /usr/local/cuda-10.1/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*(其实可以不要这行,个人习惯,保险起见)
sudo chmod a+r /usr/local/cuda-10.1/include/cudnn.h /usr/local/cuda-10.1/lib64/libcudnn*
(5)验证:
cuda验证:
nvcc -V;
cat /usr/local/cuda-10.1/version.txt;
cd /usr/local/cuda-10.1/samples/1_Utilities/deviceQuery
sudo make
./deviceQuery
cudnn验证:
cat /usr/local/cuda-10.1/include/cudnn.h | grep CUDNN_MAJOR -A 2
2.anaconda的安装
我用的最新版的Anaconda:Anaconda3-2019.07-Linux-x86_64.sh
sh Anaconda3-2019.07-Linux-x86_64.sh
验证:
source activate
anaconda-navigator
3.pycharm的安装
直接解压就完事了,下面是介绍如何设置桌面快捷图标
cd /usr/share/applications
vim /usr/share/applications/pycharm.desktop
进入pycharm.desktop之后编辑文件如下并注意第6行是启动pycharm的命令,需要将pycharm.sh的文件路径换成安装的路径
(因为安装时路径是自定义的)。第 7 行是图标图片文件的路径,需要将pycharm.png的文件路径换成安装的路径(因为安装时路径是自定义的)。
[Desktop Entry]
Type=Application
Name=Pycharm
GenericName=Pycharm
Comment=Python_IDE
Exec=sh /root/pycharm2019/pycharm-2019.1/bin/pycharm.sh
Icon=/root/pycharm2019/pycharm-2019.1/bin/pycharm.png
Terminal=yes
Categories=Application;Network;
验证:按’Win’键(alt左边那个),找到’pycharm’并点击
4.mmdetection的搭建
就目前而,个人认为mmdetection是开源的目标检测平台的最佳选择,它拥有最多的模型以及最新的更新速度:
理论上目前最高版本的pytorch(1.2)支持最高版本的cuda是10.0.130并不支持10.1
但是我自己测试了官方的test程序是可以跑的,不过过程中遇到了以下几个问题:
1.libcudart.so.9.0: cannot open shared object file: No such file or directory
原因:torchversion>=0.2.5
解决办法:https://blog.csdn.net/w55100/article/details/91048193
2.libcudnn.so.7 is not a symbolic link
解决办法:https://blog.csdn.net/weixin_32820767/article/details/81382877
3.ImportError: cannot import name ‘deform_conv_cuda’ from ‘mmdet.ops.dcn’
解决办法:在mmdetection的根目录下重新安装关键包,用python3 setup.py install
我的测试程序:(将测试程序和test.jpg放在mmdetection的根目录下)
from mmdet.apis import init_detector, inference_detector, show_result
import mmcv
config_file = 'configs/mask_rcnn_r101_fpn_1x.py'
checkpoint_file = 'checkpoints/mask_rcnn_r101_fpn_1x_20181129-34ad1961.pth'
model = init_detector(config_file, checkpoint_file, device='cuda:0')
img = 'test.jpg' # or img = mmcv.imread(img), which will only load it once
result = inference_detector(model, img)
show_result(img, result, model.CLASSES)
show_result(img, result, model.CLASSES, out_file='result.jpg')