Docker部署yolact中编译DCNv2的问题

yolact部署到Docker中,需要单独编译DCNv2

cd external/DCNv2
python setup.py build develop

但是这个DCNv2的编译需要依赖GPU,总是编不过。

 

失败1:使用python:3.6镜像

FROM python:3.6
...
WORKDIR ***/external/DCNv2
RUN python setup.py build develop
...

执行后编译报错,通过docker run进入到docker里面依然编译报错:

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Traceback (most recent call last):
  File "setup.py", line 64, in 
    ext_modules=get_extensions(),
  File "setup.py", line 41, in get_extensions
    raise NotImplementedError('Cuda is not availabel')
NotImplementedError: Cuda is not availabel

原因分析:python:3.6镜像未安装cuda驱动

 

失败2:改用pytorch/pytorch:1.2-cuda10.0-cudnn7-runtime镜像

FROM pytorch/pytorch:1.2-cuda10.0-cudnn7-runtime
...
WORKDIR ***/external/DCNv2
RUN python setup.py build develop
...

无论是Dockerfile编译,还是docker run进入到docker里面编译,依然报错:

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Traceback (most recent call last):
  File "setup.py", line 64, in 
    ext_modules=get_extensions(),
  File "setup.py", line 41, in get_extensions
    raise NotImplementedError('Cuda is not availabel')
NotImplementedError: Cuda is not availabel

原因分析:torch.cuda.is_available() 显示为True,但是from torch.utils.cpp_extension import CUDA_HOME,CUDA_HOME为NULL,看了一下/usr/local目录下确实没有cuda相关的目录。

 

失败3:改用pytorch/pytorch:1.2-cuda10.0-cudnn7-devel镜像

FROM pytorch/pytorch:1.2-cuda10.0-cudnn7-devel
...
WORKDIR ***/external/DCNv2
RUN python setup.py build develop
...

中间出现过一个apt-get update失败的问题:Failed to fetch https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/Packages.gz  Hash Sum mismatch

解决方法:

...
# Update source
RUN sed -i s:/archive.ubuntu.com:/mirrors.tuna.tsinghua.edu.cn/ubuntu:g /etc/apt/sources.list
RUN cat /etc/apt/sources.list
RUN apt-get clean
RUN apt-get -y update --fix-missing --allow-unauthenticated
...

docker build跑起来,结果编译依然报错(吐血): 

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Traceback (most recent call last):
  File "setup.py", line 64, in 
    ext_modules=get_extensions(),
  File "setup.py", line 41, in get_extensions
    raise NotImplementedError('Cuda is not availabel')
NotImplementedError: Cuda is not availabel

但是通过docker run --gpus all -it ... /bin/bash进入到docker里面,居然编译成功了。

running build
running build_ext
running develop
running egg_info
writing DCNv2.egg-info/PKG-INFO
writing dependency_links to DCNv2.egg-info/dependency_links.txt
writing top-level names to DCNv2.egg-info/top_level.txt
reading manifest file 'DCNv2.egg-info/SOURCES.txt'
writing manifest file 'DCNv2.egg-info/SOURCES.txt'
running build_ext
copying build/lib.linux-x86_64-3.6/_ext.cpython-36m-x86_64-linux-gnu.so -> 
Creating /opt/conda/lib/python3.6/site-packages/DCNv2.egg-link (link to .)
Adding DCNv2 0.1 to easy-install.pth file

Installed ***/external/DCNv2
Processing dependencies for DCNv2==0.1
Finished processing dependencies for DCNv2==0.1

 原因分析:通过docker run进入到docker里面编译时,已通过--gpus选项为docker指定了GPU,所以可以使用GPU并编译成功。但在docker build执行Dockerfile时并未为docker指定GPU,所以依然无法使用GPU。

 

终极方案:不在docker build时通过Dockerfile编译,而是在ENDPOINT中执行编译:

FROM pytorch/pytorch:1.2-cuda10.0-cudnn7-devel
...
ENTRYPOINT ["sh", "run.sh"]

在run.sh中编译DCNv2:

cd external/DCNv2
python setup.py build develop
cd ../..
python ***.py

 

你可能感兴趣的:(CUDA,Python,docker)