今天想学习李沐老师的课程,顺手搭了需要的深度学习框架,详细过程见官网。需要说明的就是我装的是GPU版,cuda版本为10.0,因此直接执行:
pip install mxnet-cu100 -i https://pypi.douban.com/simple
安装完成后,在import时遇到"OSError: libnccl.so.2: cannot open shared object file: No such file or directory"的报错。
wnj@wnj:~/Projects/d2l-zh$ conda activate gluon
(gluon) wnj@wnj:~/Projects/d2l-zh$ python
Python 3.6.13 |Anaconda, Inc.| (default, Feb 23 2021, 21:15:04)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from mxnet import nd
Traceback (most recent call last):
File "" , line 1, in
File "/home/wnj/anaconda3/envs/gluon/lib/python3.6/site-packages/mxnet/__init__.py", line 23, in
from .context import Context, current_context, cpu, gpu, cpu_pinned
File "/home/wnj/anaconda3/envs/gluon/lib/python3.6/site-packages/mxnet/context.py", line 23, in
from .base import classproperty, with_metaclass, _MXClassPropertyMetaClass
File "/home/wnj/anaconda3/envs/gluon/lib/python3.6/site-packages/mxnet/base.py", line 351, in
_LIB = _load_lib()
File "/home/wnj/anaconda3/envs/gluon/lib/python3.6/site-packages/mxnet/base.py", line 342, in _load_lib
lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_LOCAL)
File "/home/wnj/anaconda3/envs/gluon/lib/python3.6/ctypes/__init__.py", line 348, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libnccl.so.2: cannot open shared object file: No such file or directory
>>> import mxnet
Traceback (most recent call last):
File "" , line 1, in
File "/home/wnj/anaconda3/envs/gluon/lib/python3.6/site-packages/mxnet/__init__.py", line 23, in
from .context import Context, current_context, cpu, gpu, cpu_pinned
File "/home/wnj/anaconda3/envs/gluon/lib/python3.6/site-packages/mxnet/context.py", line 23, in
from .base import classproperty, with_metaclass, _MXClassPropertyMetaClass
File "/home/wnj/anaconda3/envs/gluon/lib/python3.6/site-packages/mxnet/base.py", line 351, in
_LIB = _load_lib()
File "/home/wnj/anaconda3/envs/gluon/lib/python3.6/site-packages/mxnet/base.py", line 342, in _load_lib
lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_LOCAL)
File "/home/wnj/anaconda3/envs/gluon/lib/python3.6/ctypes/__init__.py", line 348, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libnccl.so.2: cannot open shared object file: No such file or directory
libnccl是nvidia的一个多块GPU并行训练的工具,看来mxnet是需要这个包,但是我用的10.0的cuda在官网最低只支持到cuda10.2,已经没有匹配的libnccl供下载使用了。苟且试了官网教程的方法,装个cuda10.2的,本地安装或者网络安装的方式都不顶用。
参考了这位老哥的配置步骤,找了个nccl2.4.8-cuda10.0的资源,资源我附上网盘链接,有需要可以自取。
下载后,cd nccl_2.4.8-1+cuda10.0_x86_64/
可以操作删除lib/pkgconfig
文件夹,然后执行:
sudo cp include/* /usr/local/cuda-10.0/include
sudo cp lib/* /usr/local/cuda-10.0/lib64
到此问题解决,可以完美import。
如需要nccl_2.4.8-1+cuda10.0_x86_64.txz
,网盘自取:
链接:https://pan.baidu.com/s/19AGikazGi2Lz6yfG0tSIfw
提取码:3stf
复制这段内容后打开百度网盘手机App,操作更方便哦