安装mxnet-gpu版,解决在import时报错“OSError: libnccl.so.2“的问题

安装mxnet-gpu版,解决在import时报错“OSError: libnccl.so.2“的问题

  • 环境配置
  • 配置过程

环境配置

今天想学习李沐老师的课程,顺手搭了需要的深度学习框架,详细过程见官网。需要说明的就是我装的是GPU版,cuda版本为10.0,因此直接执行:

pip install mxnet-cu100 -i https://pypi.douban.com/simple

安装完成后,在import时遇到"OSError: libnccl.so.2: cannot open shared object file: No such file or directory"的报错。

wnj@wnj:~/Projects/d2l-zh$ conda activate gluon
(gluon) wnj@wnj:~/Projects/d2l-zh$ python
Python 3.6.13 |Anaconda, Inc.| (default, Feb 23 2021, 21:15:04) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from mxnet import nd
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/wnj/anaconda3/envs/gluon/lib/python3.6/site-packages/mxnet/__init__.py", line 23, in 
    from .context import Context, current_context, cpu, gpu, cpu_pinned
  File "/home/wnj/anaconda3/envs/gluon/lib/python3.6/site-packages/mxnet/context.py", line 23, in 
    from .base import classproperty, with_metaclass, _MXClassPropertyMetaClass
  File "/home/wnj/anaconda3/envs/gluon/lib/python3.6/site-packages/mxnet/base.py", line 351, in 
    _LIB = _load_lib()
  File "/home/wnj/anaconda3/envs/gluon/lib/python3.6/site-packages/mxnet/base.py", line 342, in _load_lib
    lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_LOCAL)
  File "/home/wnj/anaconda3/envs/gluon/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libnccl.so.2: cannot open shared object file: No such file or directory
>>> import mxnet
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/wnj/anaconda3/envs/gluon/lib/python3.6/site-packages/mxnet/__init__.py", line 23, in 
    from .context import Context, current_context, cpu, gpu, cpu_pinned
  File "/home/wnj/anaconda3/envs/gluon/lib/python3.6/site-packages/mxnet/context.py", line 23, in 
    from .base import classproperty, with_metaclass, _MXClassPropertyMetaClass
  File "/home/wnj/anaconda3/envs/gluon/lib/python3.6/site-packages/mxnet/base.py", line 351, in 
    _LIB = _load_lib()
  File "/home/wnj/anaconda3/envs/gluon/lib/python3.6/site-packages/mxnet/base.py", line 342, in _load_lib
    lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_LOCAL)
  File "/home/wnj/anaconda3/envs/gluon/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libnccl.so.2: cannot open shared object file: No such file or directory

libnccl是nvidia的一个多块GPU并行训练的工具,看来mxnet是需要这个包,但是我用的10.0的cuda在官网最低只支持到cuda10.2,已经没有匹配的libnccl供下载使用了。苟且试了官网教程的方法,装个cuda10.2的,本地安装或者网络安装的方式都不顶用。
安装mxnet-gpu版,解决在import时报错“OSError: libnccl.so.2“的问题_第1张图片

参考了这位老哥的配置步骤,找了个nccl2.4.8-cuda10.0的资源,资源我附上网盘链接,有需要可以自取。

配置过程

下载后,cd nccl_2.4.8-1+cuda10.0_x86_64/
可以操作删除lib/pkgconfig文件夹,然后执行:

sudo cp include/* /usr/local/cuda-10.0/include
sudo cp lib/* /usr/local/cuda-10.0/lib64

到此问题解决,可以完美import。
如需要nccl_2.4.8-1+cuda10.0_x86_64.txz,网盘自取:
链接:https://pan.baidu.com/s/19AGikazGi2Lz6yfG0tSIfw
提取码:3stf
复制这段内容后打开百度网盘手机App,操作更方便哦

你可能感兴趣的:(深度学习)