PV-RCNN复现:spconv环境配置及遇到的问题

PV-RCNN复现:spconv环境配置及遇到的问题

  • 1.本机环境
  • 2.配置过程

1.本机环境

Ubuntu 20.04
Cuda 10.0 (后面又换了Cuda 10.2)
python 3.7.10
pytorch 1.6.0
gcc 7.5.0

2.配置过程

基本按照参考博客配置即可。
注意两点:
1)上面链接里面使用的是pytorch1.3.0,但我在后面安装spconv

python setup.py bdist_wheel

的时候,出现了错误:

[  8%] Built target spconv_nms
[ 12%] Building CXX object src/utils/CMakeFiles/spconv_utils.dir/all.cc.o
Consolidate compiler generated dependencies of target cuhash
make[2]: *** 没有规则可制作目标“/usr/local/cuda/lib64/libcudart.so”,由“src/cuhash/CMakeFiles/cuhash.dir/cmake_device_link.o” 需求。 停止。
make[1]: *** [CMakeFiles/Makefile2:154:src/cuhash/CMakeFiles/cuhash.dir/all] 错误 2
make[1]: *** 正在等待未完成的任务....
In file included from /home/cora/PointCloud/spconv/src/utils/all.cc:15:0:
/home/cora/PointCloud/spconv/include/spconv/box_iou.h:21:10: fatal error: boost/geometry.hpp: 没有那个文件或目录
 #include 
          ^~~~~~~~~~~~~~~~~~~~
compilation terminated.
make[2]: *** [src/utils/CMakeFiles/spconv_utils.dir/build.make:76:src/utils/CMakeFiles/spconv_utils.dir/all.cc.o] 错误 1
make[1]: *** [CMakeFiles/Makefile2:232:src/utils/CMakeFiles/spconv_utils.dir/all] 错误 2
make: *** [Makefile:136:all] 错误 2
Traceback (most recent call last):
  File "setup.py", line 108, in 
    zip_safe=False,
  File "/home/cora/anaconda3/envs/pvrcnn/lib/python3.7/site-packages/setuptools/__init__.py", line 163, in setup
    return distutils.core.setup(**attrs)
  File "/home/cora/anaconda3/envs/pvrcnn/lib/python3.7/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/home/cora/anaconda3/envs/pvrcnn/lib/python3.7/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/home/cora/anaconda3/envs/pvrcnn/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/home/cora/anaconda3/envs/pvrcnn/lib/python3.7/site-packages/wheel/bdist_wheel.py", line 299, in run
    self.run_command('build')
  File "/home/cora/anaconda3/envs/pvrcnn/lib/python3.7/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/home/cora/anaconda3/envs/pvrcnn/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/home/cora/anaconda3/envs/pvrcnn/lib/python3.7/distutils/command/build.py", line 135, in run
    self.run_command(cmd_name)
  File "/home/cora/anaconda3/envs/pvrcnn/lib/python3.7/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/home/cora/anaconda3/envs/pvrcnn/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "setup.py", line 48, in run
    self.build_extension(ext)
  File "setup.py", line 92, in build_extension
    subprocess.check_call(['cmake', '--build', '.'] + build_args, cwd=self.build_temp)
  File "/home/cora/anaconda3/envs/pvrcnn/lib/python3.7/subprocess.py", line 363, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--config', 'Release', '--', '-j4']' returned non-zero exit status 2.

可以看到第一个问题出在
make[2]: *** 没有规则可制作目标“/usr/local/cuda/lib64/libcudart.so”,由“src/cuhash/CMakeFiles/cuhash.dir/cmake_device_link.o” 需求。 停止。
经网上查询,有的说把pytorch换成1.6.0版本即可。因此:

pip uninstall torch
pip uninstall torchvision

但是官网上并没有对应cuda10.0的pytorch1.6.0,后来发现安装比cuda10.0高的版本也是可以的,因此选择cuda10.1对应的pytorch进行安装:

conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.1 -c pytorch

2)libboost配置文件的安装
第二个问题出在
/home/cora/PointCloud/spconv/include/spconv/box_iou.h:21:10: fatal error: boost/geometry.hpp: 没有那个文件或目录

经检查是libboost配置文件没装上或者安装失败的原因,因此重新安装即可:

sudo apt-get install libboost-filesystem-dev
sudo apt-get install libboost-dev

接下来就可以愉快地训练PV-RCNN了!

………………分割线…………………………

在我按上面装完之后,一切都看起来很正常,于是开心地准备跑一下demo,然后就遇到了如下错误:Cuda kernel failed:invalid device function 段错误
绝望ing……

因为我之前跑pointRCNN的时候也有遇到过这个问题,很多博客说是算力设置的不对,但是我根本找不到在哪设置算力……最后忘记看了哪篇博客,把python3.6的虚拟环境换成了3.7,重新配了一遍环境就好了,反正总的来说就是某些版本不匹配的问题,我感觉是和pytorch相关。

然后我就想到了之前装的pytorch是cuda10.1版本的,但是我本机的cuda是10.0,虽然这么装上之后,测试的torch也是能用的,但说不定哪里不匹配呢(哭了)

所以我又装了cuda10.2,可以参考ubuntu上如何安装多cuda版本。

接下来就把原来的虚拟环境删了,我为了省心把git上下载的工程文件也删了,从头再来,但是我后面再下载的时候就老是git clone 失败(此处又是一把辛酸泪)……所以应该也可以不删工程文件,但要记得把build文件夹删掉,还有安装spconv时生成的.whl文件,然后再安装的时候重新build一遍就行。
这次我装的是pytorch1.6.0,对应cuda10.2,命令如下:

conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.2 -c pytorch

然后就是按之前的流程都再走一遍。

quickdemo:

python demo.py --cfg_file cfgs/kitti_models/pv_rcnn.yaml --ckpt ../pv_rcnn_8369.pth --data_path ../data/kitti/training/velodyne/000002.bin

遇到问题:
ImportError: Could not import backend for traitsui. Make sure you
have a suitable UI toolkit like PyQt/PySide or wxPython
installed.

我是python3.7的环境,按如下命令安装pyside2:

pip install pyside2

另一个问题:
qt.qpa.plugin: Could not load the Qt platform plugin “xcb” in “” even though it was found.
不知道为啥,但解决方法是:

sudo apt-get install libxcb-xinerama0

这次是真的可以愉快地跑PVRCNN了!!!

你可能感兴趣的:(python,spconv)