mmcv NCCL 报错 mmcv/_ext.cpython-37m-x86_64-linux-gnu.so: undefined symbol, RuntimeError: NCCL error i

报错:

mmcv/_ext.cpython-37m-x86_64-linux-gnu.so: undefined symbol

RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:38, unhandled cuda error, NCCL version
RuntimeError: NCCL error in: …/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:45

原因:torch/torchvision需要更新,mmcv又强烈依赖二者版本,也需要对应更新,并且要注意版本对应

解决:

pip install torch-1.8.1+cu111-cp37-cp37m-linux_x86_64.whl # 文件在官网下载 wget https://download.pytorch.org/whl/cu111/torch-1.8.1%2Bcu111-cp37-cp37m-linux_x86_64.whl
pip install torchvision-0.9.1+cu111-cp37-cp37m-linux_x86_64.whl # wget https://download.pytorch.org/whl/cu111/torchvision-0.9.1%2Bcu111-cp37-cp37m-linux_x86_64.whl
pip uninstall -y mmcv-full
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.8.0/index.html

参考:
https://github.com/ultralytics/yolov5/issues/4530
https://github.com/open-mmlab/mmdetection/issues/4291

你可能感兴趣的:(Pytorch,#,DL-报错,#,CV-基础,标签)