TransFusion环境配置以及遇到的各种报错处理

TransFusion环境配置以及遇到的各种报错处理

  • TransFusion环境配置
  • 报错
    • 报错一
    • 报错二
    • 报错三
    • 报错四
    • 报错五
    • 报错六
    • 报错七
    • 报错八

TransFusion环境配置

基本环境如下:

  • NVIDIA GeForce RTX 3090
  • Linux(Ubuntu 20.04)
  • NVIDIA显卡驱动版本:11.0
  • CUDA version:10.2

TransFusion环境配置与安装:

#创建conda环境
conda create -n transfusion python=3.7 -y
conda activate transfusion
# 安装pytorch
pip install torch==1.10.0+cu102 torchvision==0.11.1+cu102 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
# 安装mmcv
pip install mmcv-full==1.3.11 -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.10/index.html
# 安装mmdetection
pip install mmdet==2.11.0
# 安装mmdetection3d
git clone https://github.com/XuyangBai/TransFusion.git
cd TransFusion
pip install -v -e .

报错

报错一

error: subprocess-exited-with-error
× Running setup.py install for mmpycocotools did not run successfully.
│ exit code: 1
╰─> [19 lines of output]
running install
/opt/conda/envs/pycoco/lib/python3.7/site-packages/setuptools/command/install.py:37: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
setuptools.SetuptoolsDeprecationWarning,
running build
running build_py
creating build/lib.linux-x86_64-cpython-37
creating build/lib.linux-x86_64-cpython-37/pycocotools
copying pycocotools/init.py -> build/lib.linux-x86_64-cpython-37/pycocotools
copying pycocotools/cocoeval.py -> build/lib.linux-x86_64-cpython-37/pycocotools
copying pycocotools/mask.py -> build/lib.linux-x86_64-cpython-37/pycocotools
copying pycocotools/coco.py -> build/lib.linux-x86_64-cpython-37/pycocotools
running build_ext
building ‘pycocotools._mask’ extension
creating build/temp.linux-x86_64-cpython-37
creating build/temp.linux-x86_64-cpython-37/common
creating build/temp.linux-x86_64-cpython-37/pycocotools
gcc -pthread -B /opt/conda/envs/pycoco/compiler_compat -Wl,–sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/opt/conda/envs/pycoco/lib/python3.7/site-packages/numpy/core/include -Icommon -I/opt/conda/envs/pycoco/include/python3.7m -c …/common/maskApi.c -o build/temp.linux-x86_64-cpython-37/…/common/maskApi.o
gcc: error: …/common/maskApi.c: No such file or directory
error: command ‘/usr/bin/gcc’ failed with exit code 1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure
× Encountered error while trying to install package.
╰─> mmpycocotools

解决办法:安装Cython

pip install Cython==0.29.36

报错二

在运行pip install -v -e .时,遇到如下报错:

/XXX/XXX/TransFusion/mmdet3d/ops/voxel/src/scatter_points_cuda.cu(272): error: no instance of overloaded function “at::Tensor::index_put_” matches the argument list
argument types are: (at::Tensor, at::Tensor)
object type is: at::Tensor
1 error detected in the compilation of “/XXX/XXX/TransFusion/mmdet3d/ops/voxel/src/scatter_points_cuda.cu”.
ninja: build stopped: subcommand failed.

打开scatter_points_cuda.cu:

vim mmdet3d/ops/voxel/src/scatter_points_cuda.cu

修改第272行代码为coors_map.index_put_({coors_id_argsort}, coors_map_sorted);

报错三

博主在准备nuscenes数据时,遇到如下的报错:

AttributeError: module ‘pycocotools’ has no attribute ‘version

首先卸载pycocotools:

pip uninstall pycocotools

然后安装mmpycocotools:

pip install mmpycocotools

然后遇到

“ModuleNotFoundError: No module named ‘pycocotools’ ”

重新安装mmpycocotools:

pip uninstall mmpycocotools
pip install mmpycocotools

报错四

AttributeError: module ‘distutils’ has no attribute ‘version’

解决办法安装setuptools

pip install setuptools==59.5.0

报错五

在模型训练时遇到此错误:

RuntimeError: /XXX/XXX/TransFusion/mmdet3d/ops/spconv/src/indice_cuda.cu 118
cuda execution failed with error 700
terminate called after throwing an instance of ‘c10::CUDAError’
what(): CUDA error: an illegal memory access was encountered
Exception raised from create_event_internal at …/c10/cuda/CUDACachingAllocator.cpp:1211 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f2292e25d62 in /XXX/XXX/miniconda3/envs/transfusion/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x1c4d3 (0x7f22930884d3 in /XXX/XXX/miniconda3/envs/transfusion/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1a2 (0x7f2293088ee2 in /XXX/XXX/miniconda3/envs/transfusion/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0xa4 (0x7f2292e0f314 in /XXX/XXX/miniconda3/envs/transfusion/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #4: + 0x299ee9 (0x7f2181b7aee9 in /XXX/XXX/miniconda3/envs/transfusion/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: + 0xae8069 (0x7f21823c9069 in /XXX/XXX/miniconda3/envs/transfusion/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #6: THPVariable_subclass_dealloc(_object*) + 0x2b9 (0x7f21823c9389 in /XXX/XXX/miniconda3/envs/transfusion/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #7: python() [0x497017]
frame #8: python() [0x4a0a87]
frame #9: python() [0x4b5cfb]
frame #10: python() [0x4b5cfb]
frame #11: python() [0x4b0858]
frame #12: python() [0x4c5b50]
frame #13: python() [0x4c5b66]
frame #14: python() [0x4c5b66]
frame #15: python() [0x4c5b66]
frame #16: python() [0x4c5b66]
frame #17: python() [0x4c5b66]
frame #18: python() [0x4c5b66]
frame #19: python() [0x4946f7]

frame #23: python() [0x53fc79]
frame #25: __libc_start_main + 0xe7 (0x7f22a7bc9c87 in /lib/x86_64-linux-gnu/libc.so.6)
frame #26: python() [0x53f9ee]

解决方法:
如果是单卡训练,使用–gpu-id 0进行训练,如果使用gpu1/2/3均会报次错误;
如果是多卡训练,gpu-id要从0开始。

附:
在github上有通过把mmdet3d/ops/spconv/src/indice_cuda.cu文件里面所有的4096改为256来解决此报错的,但我没有尝试,如果使用上述方法未解决此问题,可以修改一下试试。

报错六

在模型训练时遇到此错误:

RuntimeError: CUDA error: out of memory

解决方法:
(1)减少batch-size;
(2)如果是单卡训练,观察0号显卡的显存是否已经满了,因为在其他卡(非0号卡)上训练时,需要占用一部分0号卡的显卡;
(3)换用更大显存的显卡(建议至少16G)。

报错七

在模型训练时遇到此错误:

RuntimeError: shape ‘[-1, 4, 16]’ is invalid for input of size 2160

检查pointcloud的维度(N,C),确认C的维度和模型参数想匹配,可以print()一下变量的shape,以便进一步查错。

报错八

File “/XXX/XXX/mmdet3d/ops/spconv/ops.py”, line 92, in get_indice_pairs
return get_indice_pairs_func(
RuntimeError: mmdet3d/ops/spconv/src/indice_cuda.cu 124
cuda execution failed with error 2

我反复Debug没有找到原因,后来,在网上找到的:此报错的原因是显存不够,建议换用更大显存的显卡,成功解决。

你可能感兴趣的:(计算机视觉,OpenMMLab,计算机视觉,TransFusion)