/usr/lib/x86_64-linux-gnu/libcuda.so.1: file too short

  1. 背景:在使用docker 的时候,我从一台A机器上拷贝了一个镜像到B机器上运行,在A机器上镜像运行的好好的,到到了B机器,莫名其妙出现了一个问题,在容器内执行run.sh
#run.sh
WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode.
WARNING:root:Debug message: /usr/lib/x86_64-linux-gnu/libcuda.so.1: file too short
Found Detectron ops lib: /usr/local/lib/libcaffe2_detectron_ops_gpu.so
Traceback (most recent call last):
  File "tools/train_net.py", line 33, in 
    import test_net
  File "/packages/detectron/tools/test_net.py", line 43, in 
    utils.c2.import_detectron_ops()
  File "/packages/detectron/lib/utils/c2.py", line 42, in import_detectron_ops
    dyndep.InitOpsLibrary(detectron_ops_lib)
  File "/usr/local/lib/python2.7/dist-packages/caffe2/python/dyndep.py", line 35, in InitOpsLibrary
    _init_impl(name)
  File "/usr/local/lib/python2.7/dist-packages/caffe2/python/dyndep.py", line 48, in _init_impl
    ctypes.CDLL(path)
  File "/usr/lib/python2.7/ctypes/__init__.py", line 362, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /usr/lib/x86_64-linux-gnu/libcuda.so.1: file too short

1 查阅资料得到了提示,是说检查一下文件的大小,来确保这个软连接是否有效。在容器内查看文件大小:

# ls -lh /usr/lib/x86_64-linux-gnu/libcuda.so* 
lrwxrwxrwx 1 root root 17 Jun 19 09:48 /usr/lib/x86_64-linux-gnu/libcuda.so -> libcuda.so.387.26
lrwxrwxrwx 1 root root 17 Jun 19 09:48 /usr/lib/x86_64-linux-gnu/libcuda.so.1 -> libcuda.so.387.26
-rwxr-xr-x 1 root root  0 Jun 19 09:48 /usr/lib/x86_64-linux-gnu/libcuda.so.387.26

2 一看发现libcuda.so.387.26 文件大小为0! 于是从B机器上复制了一份这个文件到容器对应的目录下:

docker cp /usr/lib/x86_64-linux-gnu/libcuda.so.387.26 0000000000:/usr/lib/x86_64-linux-gnu/libcuda.so.387.26

3 再检查一下文件是否有误

ls -lh /usr/lib/x86_64-linux-gnu/libcuda.so*
lrwxrwxrwx 1 root root  17 Jun 19 09:48 /usr/lib/x86_64-linux-gnu/libcuda.so -> libcuda.so.387.26
lrwxrwxrwx 1 root root  17 Jun 19 09:48 /usr/lib/x86_64-linux-gnu/libcuda.so.1 -> libcuda.so.387.26
-rwxr-xr-x 1 root root 11M Jul 23 09:13 /usr/lib/x86_64-linux-gnu/libcuda.so.387.26

4 再次执行 run.sh 发现还有一些是空文件。于是索性把所有的空文件从宿主机复制到容器。

ls -lh /usr/lib/x86_64-linux-gnu/*387.26
-rwxr-xr-x 1 root root 0 Jun 19 09:52 /usr/lib/x86_64-linux-gnu/libcuda.so.387.26
-rwxr-xr-x 1 root root 0 Jun 19 09:52 /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.387.26
-rwxr-xr-x 1 root root 0 Jun 19 09:52 /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.387.26
-rwxr-xr-x 1 root root 0 Jun 19 09:52 /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.387.26
-rwxr-xr-x 1 root root 0 Jun 19 09:52 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.387.26
-rwxr-xr-x 1 root root 0 Jun 19 09:52 /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.387.26
-rwxr-xr-x 1 root root 0 Jun 19 09:52 /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.387.26

你可能感兴趣的:(计算资源管理)