Ubuntu+OpenCV+libtorch 导出C++网络模型并同时编译

由于业务需求,需要把网络模型转成C++可以调用的格式。于是踩了很多坑,这里记录下。具体的网络模型为基本的端到端网络模型,网络中没有特殊的自己定义的操作,基本都是pytorch的常规操作。

1,网络导出问题

调用了几种主流的导出模型方案,主要有:1,torch.jit.trace 方法。2,torch.jit.ScriptModule方法。3,pytorch转ONNX方法。
由于模型较为复杂,方法2修改东西较多,没有转成功。方法3需要配置环境较多,且可能转为caffe版本,也暂时没有尝试。最后确定使用方法1。方法1的先决条件为,网络中的变量不能随着输入的改变而改变,需要保证网络可以生成一张图。注意这里的输入为网络的输入,而不是构建网络时需要的超参数。
具体是使用方法可以参考:
https://pytorch.org/tutorials/advanced/cpp_export.html
示例代码:

import torch
import torchvision

# An instance of your model.
model = torchvision.models.resnet18()

# An example input you would normally provide to your model's forward() method.
example = torch.rand(1, 3, 224, 224)

# Use torch.jit.trace to generate a torch.jit.ScriptModule via tracing.
traced_script_module = torch.jit.trace(model, example)
遇到问题:

我在直接使用这个命令导出模型时,会报错
torch.jit.trace assert(isinstance(orig, torch.nn.Module)),一直以为找不到模型,但是模型本身的输出是完全没有问题。于是我在jit.trace的源码中定位到问题代码,逐步输出模型参数,发现源码是把每一个网络模块转为ScriptModule,且是按照网络流动顺序转移。但是到了最后会输入None的变量。于是我查看网络模型代码,发现以下代码:

    def _make_layer(self, block, planes, blocks, stride=1):
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(
                    self.inplanes, planes * block.expansion,
                    kernel_size=1, stride=stride, bias=False
                ),
                nn.BatchNorm2d(planes * block.expansion, momentum=BN_MOMENTUM),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))
        self.inplanes = planes * block.expansion
        for i in range(1, blocks):
            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)

这里的downsample定义基础定义为None,但是在一定参数下可以为nn.Moudle类型。于是我怀疑这里的None的参数导致了模型导入不正确,于是我们可以把网络中所有的None参数修正为 torch.nn.Identity(),即不进行任何操作。如此操作,再修改网络中关于None的逻辑,就可以将网络导出为C++可读形式。

    def _make_layer(self, block, planes, blocks, stride=1):
        downsample = torch.nn.Identity()
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(
                    self.inplanes, planes * block.expansion,
                    kernel_size=1, stride=stride, bias=False
                ),
                nn.BatchNorm2d(planes * block.expansion, momentum=BN_MOMENTUM),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))
        self.inplanes = planes * block.expansion
        for i in range(1, blocks):
            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)

2,编译问题

接下来需要考虑用C++的opencv读取一张图片,然后利用C++代码读取模型参数,并获取最终需要结果。首先配置了OpenCV4的release版本,借鉴https://www.jianshu.com/p/f54b0fc13811的方法。这里的具体配置不用细讲。但是再编译期间需要下载ippicv文件,不翻墙的话总是错误,服务器翻墙又比较麻烦。我们其实可以手动下载,而用CSDN上花币下。可以使用如下方法。
定位到ippicv的make文件,我们可以发现:

  set(THE_ROOT "${OpenCV_BINARY_DIR}/3rdparty/ippicv")
  ocv_download(FILENAME ${OPENCV_ICV_NAME}
               HASH ${OPENCV_ICV_HASH}
               URL
                 "${OPENCV_IPPICV_URL}"
                 "$ENV{OPENCV_IPPICV_URL}"
                 "https://raw.githubusercontent.com/opencv/opencv_3rdparty/${IPPICV_COMMIT}/ippicv/"
               DESTINATION_DIR "${THE_ROOT}"
               ID IPPICV
               STATUS res
               UNPACK RELATIVE_URL)

其实就定位了下载路径,结合make文件的上文,我们可以得到下载地址:"https://raw.githubusercontent.com/opencv/opencv_3rdparty/32e315a5b106a7b89dbed51c28f8120a48b368b4/ippicv/ippicv_2019_lnx_intel64_general_20180723.tgz",然后翻墙迅雷下载,就可以了。然后修改make文件路径。
而pytorch的C++模型读取同样使用https://pytorch.org/tutorials/advanced/cpp_export.html中的代码。分别确认opencv4示例代码和libtorch的示例代码可以顺利跑通后,然后考虑把两份代码结合。但是直接结合就会遇到如下问题:

cv::imread(std::string const&, int)’未定义的引用

尝试使用各种解决方案,包括重新编译OpenCV4,各种修改CMakeList文件,都没有顺利编译通过。https://github.com/opencv/opencv/issues/13000opencv的源码中已经提到了这个问题,但是还是没有很好的解决方案。参考https://www.jianshu.com/p/6fe9214431c6,我们考虑对可能是OpenCV4自身的版本问题。经过测试目前OpenCV3.4可以与libtorch一起跑,但是OpenCV4暂时不行。于是对OpenCV进行降级。使用如下代码删除OpenCV4:

sudo make uninstall
cd ..
sudo rm -r build
sudo rm -r /usr/local/include/opencv2 /usr/local/include/opencv /usr/include/opencv /usr/include/opencv2 /usr/local/share/opencv /usr/local/share/OpenCV /usr/share/opencv /usr/share/OpenCV /usr/local/bin/opencv* /usr/local/lib/libopencv*
sudo apt-get –purge remove opencv-doc opencv-data python-opencv
sudo apt-get –purge remove opencv-doc opencv-data python-opencv
cd /etc/ld.so.conf.d/
rm opencv.conf

然后用同样的方法下载OpenCV3.4源码,与相对应的ippicv文件,并进行编译。使用https://www.jianshu.com/p/f646448da265的方法。其中,在cmake指令中,使用

 cmake -DCMAKE_BUILD_TYPE=Release  -DCMAKE_INSTALL_PREFIX=/usr/local ..
make
make install

然后用同样方法添加路径。至此,OpenCV3.4 配置完成。
opencv4.3 与 opencv3.4 以及他们相对应的ippv_2019_intel64_general_20190723.tgz与ippv_2020_intel64_20191018_general.tgz文件我已分享到百度网盘,方便下载。(注意这里是linux版本)
链接: https://pan.baidu.com/s/1N035W8eBBITi1TWCtljBxw 提取码: ym69

3,整合编译

首先配置CMakeList.txt 文件

cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(Human_Pose)

# Enable C++11
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_STANDARD_REQUIRED TRUE)

#Set library path
set(OpenCV_DIR /usr/local/lib/python3.6/dist-packages/torch)
set(CMAKE_PREFIX_PATH /usr/local/lib)

set(CMAKE_PREFIX_PATH /usr/local/lib/python3.6/dist-packages/torch)
find_package(Torch REQUIRED NO_CMAKE_FIND_ROOT_PATH)
find_package (OpenCV REQUIRED NO_CMAKE_FIND_ROOT_PATH)

# If the package has been found, several variables will
# be set, you can find the full list with descriptions
# in the OpenCVConfig.cmake file.
# Print some message showing some of them
message(STATUS "OpenCV library status:")
message(STATUS "    version: ${OpenCV_VERSION}")
message(STATUS "    libraries: ${OpenCV_LIBS}")
message(STATUS "    include path: ${OpenCV_INCLUDE_DIRS}")

message(STATUS "Torch library status:")
message(STATUS "    version: ${TORCH_VERSION}")
message(STATUS "    libraries: ${TORCH_LIBRARIES}")
message(STATUS "    include path: ${TORCH_INCLUDE_DIRS}")

include_directories(  ${OpenCV_INCLUDE_DIRS}  ${TORCH_INCLUDE_DIRS})

add_executable(Human_Pose Human_Pose.cpp)

target_link_libraries(Human_Pose ${OpenCV_LIBS})
target_link_libraries(Human_Pose ${TORCH_LIBRARIES})

项目名为Human_Pose,只包含一个Human_Pose.cpp文件。cpp的测试代码为:

#include 
#include
#include 
#include  // One-stop header.
#include 


using namespace std;
using namespace cv;
int main(int argc, char** argv )
{
    
    if ( argc != 3 )
    {
        printf("usage: Human_pose  \n");
        return -1;
    }
    
    cv::Mat image;
    image = cv::imread( argv[1], 1 );
    if ( !image.data )
    {
        printf("No image data \n");
        return -1;
    }
    cv2:imwrite("test.png",image);

    torch::jit::script::Module module;
    try {
        // Deserialize the ScriptModule from a file using torch::jit::load().
        module = torch::jit::load(argv[2]);
    }
    catch (const c10::Error& e) {
        std::cerr << "error loading the model\n";
        return -1;
    }
    std::cout << "module input ok\n";
    // Create a vector of inputs.
    std::vector inputs;
    inputs.push_back(torch::ones({1, 3, 256, 192}));
    // Execute the model and turn its output into a tensor.
    at::Tensor output = module.forward(inputs).toTensor();
    std::cout << output.slice(/*dim=*/1, /*start=*/0, /*end=*/5) << '\n';
    return 0;
}

新建build文件夹,然后

cd buid
cmake -DCMAKE_BUILD_TYPE=Release  ..   
cmake --build . --config Release    

至此,可以完成跑通测试流程。

你可能感兴趣的:(Ubuntu+OpenCV+libtorch 导出C++网络模型并同时编译)