pytorch cudaExtension C++ cuda扩展,引用异常记录

pytorch cudaExtension C++ cuda扩展,引用异常记录

    • 环境及描述
    • 错误1
        • 解决
    • 错误2
        • 报错是找不到这个动态链接
        • 我们可以在编译出so库后, 用ldd -r命令来找出undefined的函数名(当然也可以用nm命令
        • 那怎么知道原函数名称呢
        • 发现是opencv的imread函数调用错误
        • 解决,setup.py libraries加入opencv_imgcodecs,附完整setup.py
    • 再次尝试import retinanet._C,导入成功

环境及描述

工程来源:
NVIDIA-retinanet官方git.
物理环境:

类目 物理机版本 容器版本
os ubuntu-16.04 16.04
driver 430.14 430.14
cuda 10.0 10.1
cudnn 7.6.5 7.6.0
pytorch 1.1.0 1.1.0
opencv 3.4.1 3.4.0

迁移工作:
将retinanet部署至物理机,
并借助cudaExtension编译Engine.cpp\decode.cu\nms.cu\extension.cpp供python调用.

错误1

from retinanet._C import Engine
ImportError: dynamic module does not define module export function (PyInit__C)

解决

删除所有build文件及_C.so文件,重新install setup.py

pip install --no-cache-dir -e  .

错误2

bixian@bixian-ubuntu:~/work_space/iluvatar/zjrq/code/G_AI_perimeter_intrusion/modules/night_retinanet$ retinanet
Traceback (most recent call last):
  File "/home/bixian/anaconda3/bin/retinanet", line 11, in <module>
    load_entry_point('retinanet', 'console_scripts', 'retinanet')()
  File "/home/bixian/anaconda3/lib/python3.6/site-packages/pkg_resources/__init__.py", line 489, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/home/bixian/anaconda3/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2843, in load_entry_point
    return ep.load()
  File "/home/bixian/anaconda3/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2434, in load
    return self.resolve()
  File "/home/bixian/anaconda3/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2440, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/home/bixian/work_space/iluvatar/zjrq/code/G_AI_perimeter_intrusion/modules/night_retinanet/retinanet/main.py", line 11, in <module>
    from retinanet._C import Engine
ImportError: /home/bixian/work_space/iluvatar/zjrq/code/G_AI_perimeter_intrusion/modules/night_retinanet/retinanet/_C.so: undefined symbol: _ZN2cv6imreadERKNS_6StringEi

报错是找不到这个动态链接

_ZN2cv6imreadERKNS_6StringEi

我们可以在编译出so库后, 用ldd -r命令来找出undefined的函数名(当然也可以用nm命令

ldd -r _C.so

那怎么知道原函数名称呢

c++filt symbol _ZN2cv6imreadERKNS_6StringEi
symbol
cv::imread(cv::String const&, int)

发现是opencv的imread函数调用错误

解决,setup.py libraries加入opencv_imgcodecs,附完整setup.py

from setuptools import setup
from torch.utils.cpp_extension import BuildExtension, CUDAExtension

setup(
    name='retinanet',
    version='0.1',
    description='Fast and accurate single shot object detector',
    author = 'NVIDIA Corporation',
    author_email='[email protected]',
    packages=['retinanet', 'retinanet.backbones'],
    ext_modules=[
        # 扩展包的名字,供python导入import时使用
        # 编译以下4个文件,生成retinanet._C动态库
        CUDAExtension('retinanet._C', ['csrc/extensions.cpp', 'csrc/engine.cpp', 'csrc/cuda/decode.cu', 'csrc/cuda/nms.cu'],
        # 编译扩展包的命令参数(额外编译选项)
        extra_compile_args={
            'cxx': ['-std=c++11', '-O2', '-Wall'],
            'nvcc': [
                '-std=c++11', '--expt-extended-lambda', '--use_fast_math', '-Xcompiler', '-Wall',
                '-gencode=arch=compute_60,code=sm_60', '-gencode=arch=compute_61,code=sm_61',
                '-gencode=arch=compute_70,code=sm_70', '-gencode=arch=compute_72,code=sm_72',
                '-gencode=arch=compute_75,code=sm_75', '-gencode=arch=compute_75,code=compute_75'
            ],
        },
        # make sure you've install cuda>=10.0&TensorRT5.0.2 correctly and set both path to environment,
        # and comment out next line 如果环境变量有cuda和TRT路径,就不需要include_dirs写明了
        include_dirs = ['/usr/local/cuda-10.0/include', '/home/bixian/work_space/nvidia-cuda/TensorRT-5.0.2.6/include'],
        # libraries 库名(不是文件名称或路径)的组成的列表
        # libraries=['nvinfer', 'nvinfer_plugin', 'nvonnxparser'],
        # library_dirs = ['/usr/local/lib64/'],
        # libraries 中添加opencv库
        libraries = ['nvinfer', 'nvinfer_plugin', 'nvonnxparser', 'opencv_core', 'opencv_imgproc', 'opencv_highgui','opencv_imgcodecs'])
    ],
    cmdclass={'build_ext': BuildExtension.with_options(no_python_abi_suffix=True)},
    install_requires=[
        'torch>=1.0.0a0',
        'torchvision',
        'apex @ git+https://github.com/NVIDIA/apex',
        'pycocotools @ git+https://github.com/nvidia/cocoapi.git#subdirectory=PythonAPI',
        'pillow',
        'requests',
    ],
    # console_scripts 指明了命令行工具的名称;在“retinanet=retinanet.main:main”中,等号前面指明了工具包的名称,等号后面的内容指明了程序的入口地址。
    entry_points = {'console_scripts': ['retinanet=retinanet.main:main']}
)

再次尝试import retinanet._C,导入成功

python -c "import torch; import retinanet._C; print(dir(retinanet._C))"
['Engine', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'decode', 'nms']

python -c "import torch; from retinanet._C import Engine; print(dir(Engine))"
['__call__', '__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__',
 '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', 
 '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'infer', 
 'input_size', 'load', 'save', 'stride']

你可能感兴趣的:(pytorch,python,cuda,pytorch,c++)