libtorch模型推理例程

libtorch使我们可以使用C++进行模型推理。

安装C++版本的PyTorch

首先从官网下载libtorch。

libtorch模型推理例程_第1张图片

官网最新版本(2020/2/15)的PyTorch版本为1.4, 直接下载Pre-cxx ABI版本即可,关于C++ ABI,至少需要GCC 5,GLIBC 3.23,我使用的CentOS 7.3,使用非ABI版本,切记。

example-app

这里使用libtorch实现一个最简单的演示。

  • CMakeLists.txt
cmake_minimum_required(VERSION 3.0 FATAL_ERROR)

project(example-app)

find_package(Torch REQUIRED)

add_executable(example-app example-app.cpp)
target_link_libraries(example-app "${TORCH_LIBRARIES}")
set_property(TARGET example-app PROPERTY CXX_STANDARD 11)
  • example-app.cpp
#include 
#include 

int main() {
    torch::Tensor tensor = torch::rand({2, 3});
    std::cout << tensor << std::endl;
}
  • 现在的文件目录为:
example-app/
 CMakeLists.txt
 example-app.cpp
  • 编译
mkdir build
cd build
cmake -DCMAKE_PREFIX_PATH=/absolute/path/to/libtorch ..
make

使用C++前端推理模型

TorchScript是一种从PyTorch代码创建可序列化和可优化模型的方法。用TorchScript编写的任何代码都可以从Python进程中保存并加载到没有Python依赖关系的进程中。

Pytorch官方的TorchScript说明:TorchScript_tutorial

  • 利用Tracing将模型转换为Torch Script

In short, TorchScript provides tools to capture the definition of your model, even in light of the flexible and dynamic nature of PyTorch.

要通过tracing来将PyTorch模型转换为Torch脚本,必须将模型的实例以及样本输入传递给torch.jit.trace函数.

这将生成一个torch.jit.ScriptModule对象,并在模块的forward方法中嵌入模型评估的tracing:

TorchScript将它的定义记录到一个中间表达(IR, Intermediate Representation)中,这在深度学习中通常称为图(Graph)。

torch.jit.ScriptModule的属性包括:code, graph, save

使用Libtorch推理模型主要分为三个步骤:

  1. 训练得到模型.pth
  2. 对模型进行转化(trace模型)得到.pt
  3. 使用C++进行模型推理

这里以官方的预训练好的resnet34为例进行模型转化。

import torch
import torchvision

# An instance of your model.
model = torchvision.models.resnet34(pretrained=True)

# Must add this line, or there would be BUG, like the output index not right
model.eval()

# An example input you would normally provide to your model's forward() method.
example = torch.rand(1, 3, 224, 224)

# Use torch.jit.trace to generate a torch.jit.ScriptModule via tracing.
traced_script_module = torch.jit.trace(model, example)
traced_script_module.save("traced_resnet_model.pt")

# with torch.no_grad():
#  output = traced_script_module(torch.ones(1, 3, 224, 224))
#
# print(output[0, :5])

得到C++调用模型cxx_resnet_model.pt

根据官方API和示例编写调用代码,这里使用opencv4.1(编译过程省略)读取图片,也可以使用transforms接口读取。

  • cxx_simple.cpp 输入为固定的全1矩阵 1CHW
#include "torch/script.h"
#include "torch/torch.h"

#include 
#include 
#include 

using namespace std;

int main(int argc, const char* argv[])
{
   if (argc != 2) {
       std::cerr << "usage: torch_ext-app \n";
       return -1;
   }

    string model_file = argv[1];
    // 读取我们的权重信息1
    std::shared_ptr<torch::jit::script::Module> module = torch::jit::load(model_file);
    //module->to(at::kCUDA);

    assert(module != nullptr);
    std::cout << "ok\n";

    // 建立一个输入,维度为(1,3,224,224),并移动至cuda
    std::vector<torch::jit::IValue> inputs;
    inputs.push_back(torch::ones({1, 3, 224, 224}) * 50);  //.to(at::kCUDA));

    // Execute the model and turn its output into a tensor.
    at::Tensor output = module->forward(inputs).toTensor();

    // std::cout<< output << std::endl;
    std::cout << output.slice(/*dim=*/1, /*start=*/0, /*end=*/5) << '\n';
}
  • cxx_inference.cpp 输入一张自定义图片->预处理->推理->得到topk概率
#include 
#include 
#include 

#include 
#include 
#include 
#include 
#include 
#include 
#include 

using namespace std;

// resize并保持图像比例不变
cv::Mat resize_with_ratio(cv::Mat& img)
{
    cv::Mat temImage;
    int w = img.cols;
    int h = img.rows;

    float t = 1.;
    float len = t * std::max(w, h);
    int dst_w = 224, dst_h = 224;
    cv::Mat image = cv::Mat(cv::Size(dst_w, dst_h), CV_8UC3, cv::Scalar(128,128,128));
    cv::Mat imageROI;
    if(len==w)
    {
        float ratio = (float)h/(float)w;
        cv::resize(img,temImage,cv::Size(224,224*ratio),0,0,cv::INTER_LINEAR);
        imageROI = image(cv::Rect(0, ((dst_h-224*ratio)/2), temImage.cols, temImage.rows));
        temImage.copyTo(imageROI);
    }
    else
    {
        float ratio = (float)w/(float)h;
        cv::resize(img,temImage,cv::Size(224*ratio,224),0,0,cv::INTER_LINEAR);
        imageROI = image(cv::Rect(((dst_w-224*ratio)/2), 0, temImage.cols, temImage.rows));
        temImage.copyTo(imageROI);
    }

    return image;
}

int main(int argc, const char* argv[])
{
   if (argc < 3) {
       std::cerr << "usage: torch_ext-app  \n";
       return -1;
   }

    std::shared_ptr<torch::jit::script::Module> module = torch::jit::load(argv[1]);
    //module->to(at::kCUDA);

    cv::Mat frame;
    cv::Mat image;
    cv::Mat input;

    frame = cv::imread(argv[2]);
    cv::resize(frame, image, cv::Size(224, 224));
    imshow("resized image", image);    //显示图像
    cv::cvtColor(image, input, cv::COLOR_BGR2RGB);

    // 下方的代码即将图像转化为Tensor,随后导入模型进行预测
    torch::Tensor tensor_image = torch::from_blob(input.data, {1, input.rows, input.cols, 3}, torch::kByte);
    tensor_image = tensor_image.permute({0,3,1,2});
    tensor_image = tensor_image.toType(torch::kFloat);
    tensor_image = tensor_image.div(255);
    //tensor_image = tensor_image.to(torch::kCUDA);
    // shape of tensor_image is N,C,H,W
    tensor_image[0][0].sub_(0.485).div_(0.229);  //減去均值,除以標準差
    tensor_image[0][1].sub_(0.456).div_(0.224);
    tensor_image[0][2].sub_(0.406).div_(0.225);
    torch::Tensor result = module->forward({tensor_image}).toTensor();

    // auto max_result = result.max(1, true);
    // auto max_index = std::get<1>(max_result).item();
    // cout << max_index << endl;

    auto prob = result.softmax(1);
    // auto idx = prob.argmax();
    // cout << "The index is " << idx.item() << endl;
    // cout << "The prob is " << prob[0][idx].item() << endl;

    cout << "The top3 probs are: " << endl;
    auto top3 = prob.topk(3);
    cout << std::get<0>(top3) << endl;
    cout << std::get<1>(top3) << endl;

    cv::waitKey(0);
    return 0;
}
  • cpp_resnet.cpp 输入为一个存放图片的目录**,识别每一张图片并输出最大概率**
#include 
#include 
#include 

#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include 

using namespace cv;
using namespace std;
using namespace chrono;

vector<string> list_dir(const char* path) {
	vector<string> files;
	struct dirent *entry;
	DIR *dir = opendir(path);

	if (dir == nullptr) {
		return files;
	}

	while ((entry = readdir(dir)) != nullptr) {
		//cout << entry->d_name << endl;
		files.push_back(entry->d_name);
	}
	closedir(dir);

	return files;
}

/*
 * Case Sensitive Implementation of endsWith()
 * It checks if the string 'mainStr' ends with given string 'toMatch'
 */
bool is_image(const std::string &mainStr)
{
    vector<string> exts{"png", "jpg", "jpeg"};
    bool ret = false;

    for(auto ext : exts)
    {
        if(mainStr.size() >= ext.size() &&
                mainStr.compare(mainStr.size() - ext.size(), ext.size(), ext) == 0)
                return true;
            else
                ret = false;
    }

    return ret;
}

bool LoadImage(std::string file_name, cv::Mat &image) {
  image = cv::imread(file_name);  // CV_8UC3
  if (image.empty() || !image.data) {
    return false;
  }
  cv::cvtColor(image, image, COLOR_BGR2RGB);
  std::cout << "== image size: " << image.size() << " ==" << std::endl;

  // scale image to fit
  cv::Size scale(224, 224);
  cv::resize(image, image, scale);
  std::cout << "== simply resize: " << image.size() << " ==" << std::endl;

  // convert [unsigned int] to [float]
  image.convertTo(image, CV_32FC3, 1.0f / 255.0f);

  return true;
}

int main(int argc, char *argv[]) 
{
   if(argc < 3)
   {
       cerr << "usage: torch_ext-app  "
                 << "\n";
       return -1;
   }
    std::string model_path = argv[1];
    std::string test_path = argv[2];

    auto time_start = system_clock::now();
    std::shared_ptr<torch::jit::script::Module> module = torch::jit::load(model_path);
    assert(module != nullptr);
    
    // Dont calculate gradients, same like with torch.no_grad()
    torch::NoGradGuard no_grad;

    auto time_load = system_clock::now();
    auto duration1 = duration_cast<microseconds>(time_load - time_start);
    cout << "加载模型花费了"
         << double(duration1.count()) * microseconds::period::num / microseconds::period::den
         << "秒" << endl;

    
    // 图像预处理
    // int img_size = 224;
    // std::vector inputs;
    // Mat src, image;
    // src = imread(test_path);  // H, W, C
    // cv::imshow("lol1", src);

    // resize(src, image, Size(img_size, img_size)); // resize img
    // cvtColor(image, image, COLOR_BGR2RGB);  // bgr -> rgb   
    // at::TensorOptions options(at::ScalarType::Byte);
    // at::Tensor img_tensor = torch::from_blob(image.data, {1, img_size, img_size, 3}, options);
    // img_tensor = img_tensor.permute({0, 3, 1, 2});  //調換順序變爲torch輸入的格式 1,3,224,224
    // img_tensor = img_tensor.toType(torch::kFloat);
    // img_tensor = img_tensor.div(255);
    // img_tensor[0][0].sub_(0.485).div_(0.229);  //減去均值,除以標準差
    // img_tensor[0][1].sub_(0.456).div_(0.224);
    // img_tensor[0][2].sub_(0.406).div_(0.225);

    // inputs.emplace_back(img_tensor);
    // at::Tensor result = module->forward(inputs).toTensor();
    // auto prob = result.softmax(1);
    // auto idx = prob.argmax();
    // cout << "The index is " << idx.item() << endl;
    // cout << "The prob is " << prob[0][idx].item() << endl;

    for (const auto &p : list_dir(test_path.c_str())){
        // std::vector inputs;
        //std::cout << p << '\n';
        if(!is_image(p))
            continue;
        std::string s = test_path + p;
        //std::cout << "path: " << s << endl;
        Mat image = imread(s);
        resize(image, image, Size(224, 224)); // resize img
        cvtColor(image, image, COLOR_BGR2RGB);  // bgr -> rgb   
        image.convertTo(image, CV_32FC3, 1.0 / 255);
        //at::TensorOptions options(at::ScalarType::Byte); , options
        at::Tensor img_tensor = torch::from_blob(image.data, {1, 224, 224, 3});
        img_tensor = img_tensor.permute({0, 3, 1, 2});  //調換順序變爲torch輸入的格式 1,3,224,224
        
        // img_tensor = img_tensor.div(255);
        img_tensor[0][0].sub_(0.485).div_(0.229);  //減去均值,除以標準差
        img_tensor[0][1].sub_(0.456).div_(0.224);
        img_tensor[0][2].sub_(0.406).div_(0.225);
        img_tensor = img_tensor.toType(torch::kFloat);

        // // std::ifstream is(model_path, std::ifstream::binary);
        // inputs.emplace_back(img_tensor);
        at::Tensor result = module->forward({img_tensor}).toTensor();
        //auto max_result = result.max(0, true);
        //std ::cout << std::get<1>(max_result);
        //auto max_index = std::get<1>(max_result).item();
        //std::cout << max_index << std::endl;
        auto pred = result.argmax(1);
        std ::cout << pred.item<float>() << std::endl;
    }

    auto time_end = system_clock::now();
    auto duration2 = duration_cast<microseconds>(time_end - time_start);
    cout << "推理模型花费了"
         << double(duration2.count()) * microseconds::period::num / microseconds::period::den
         << "秒" << endl;

    //cv::waitKey(0);
    return 0;
}
  • CMakeList.txt
cmake_minimum_required(VERSION 3.2.0)

project(example)

set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_STANDARD_REQUIRED TRUE)

set(CMAKE_PREFIX_PATH ../libtorch)
find_package(Torch REQUIRED)
message(STATUS "Pytorch status:")
message(STATUS "    libraries: ${TORCH_LIBRARIES}")

# set(OpenCV_DIR "/Users/jingxiaofei/Downloads/opencv-4.1.0/build")
# find_package(OpenCV REQUIRED PATHS OpenCV_DIR)
find_package(OpenCV REQUIRED)
include_directories(${OpenCV_INCLUDE_DIRS})
message(STATUS "OpenCV library status:")
message(STATUS "    version: ${OpenCV_VERSION}")
message(STATUS "    libraries: ${OpenCV_LIBS}")
message(STATUS "    include path: ${OpenCV_INCLUDE_DIRS}")

# add_executable(torch_ext tensor_test.cpp)
# target_link_libraries(torch_ext "${TORCH_LIBRARIES}")
# set_property(TARGET torch_ext PROPERTY CXX_STANDARD 11)

# add_executable(test2 cxx_inference.cpp)
# target_link_libraries(test2  ${TORCH_LIBRARIES} ${OpenCV_LIBS})
# set_property(TARGET test2 PROPERTY CXX_STANDARD 11)

add_executable(test3 cxx_resnet.cpp)
target_link_libraries(test3 ${TORCH_LIBRARIES} ${OpenCV_LIBS})
set_property(TARGET test3 PROPERTY CXX_STANDARD 11)

# add_executable(test4 cxx_simple.cpp)
# target_link_libraries(test4 ${TORCH_LIBRARIES})
# set_property(TARGET test4 PROPERTY CXX_STANDARD 11)

# add_executable(test5 cxx_video.cpp)
# target_link_libraries(test5 ${TORCH_LIBRARIES} ${OpenCV_LIBS})
# set_property(TARGET test5 PROPERTY CXX_STANDARD 11)
  • 输出
  1. C++版本的输出与Python有些许的不同,1%左右,可能是由于图片预处理的resize?
  2. 测试C++版本与Python的resnet34模型推理,400张图片,c++耗时72s,Python耗时34s。观察CPU的利用率发现Python版本4核利用率95%,C++版本4核利用率40%.

c++cpp
python
py

Trouble Shooting

  • dyld: Library not loaded: @rpath/libmklml.dylib
    正常情况下会编译成功,但是运行可执行文件时报错内容如下:

    dyld: Library not loaded: @rpath/libmklml.dylib
    Referenced from:/Users/jingxiaofei/Documents/Learning/PyTorch/libtorch_cpp/libtorch/lib/libtorch.1.dylib
    Reason: image not found
    

    解决方案参考github issue:到MKL-DNN下载编译好的libiomp5.dylib和libmklml.dylib复制到Libtorch的lib目录下。
    v0.20之前的mkl有编译好的各系统版本,之后的也可以自己编译生成库文件。

  • 编译警告

    'IntList' has been explicitly marked deprecated here
    using IntList C10_DEPRECATED_USING = ArrayRef<int64_t>;
    

    解决方案参考https://www.twblogs.net/a/5c80fbe6bd9eee35fc136d36
    修改源码
    为避免VS中的IntelliSense错误,修改了Caffe2\include文件中的两处源码
    修改了 c10\marcos\Marcos.h 中的 line 46:

    //#elif __cplusplus && defined(__has_cpp_attribute) \
    #elif __cplusplus && defined(__clang__) &&defined(__has_cpp_attribute)
    

    注释了 c10\util\ArrayRef.h 中的 line 273:

    //using IntList C10_DEPRECATED_USING = ArrayRef;
    

    然后将代码里需要的IntList改成ArrayRef即可。
    更好的方法是对于list直接显式的写出来,{N,C,H,W}

  • 输出结果错误
    如果在进行jit模型转换的时候没有添加model.eval()这句话,模型的输出会有问题,这里我遇到的问题是:对于预训练好的resnet34,未使用torch.no_grad()进行推理,无论输入如何,输出都是相同的,Softmax输出的最大概率类别始终是463.

  • 不计算梯度
    与with torch.no_grad()等效的C++操作是 torch::NoGradGuard no_grad.参考以下的例程
    Python:

    with torch.no_grad():
    
    module.weight += 1
    

    C++:

    {
        torch::NoGradGuard no_grad;
        module->weight += 1;
    } // Note that anything out of this scope will still record gradients
    
  • at::Tensortorch.Tensor
    at::Tensor is not differentiable while torch::Tensor is. It was similar to the difference between Variables and pure tensors in Python pre 0.4.0.
    As far as I know torch::Tensors won’t have any overhead in using them even if you don’t need to differentiate them, so that might be the reason to prefer the torch namespace for creating tensors.
    at::Tensor不可求导而torch.Tensor可以。

  • Linux版本libtorch

    undefined reference to `std::__cxx11::basic_string, std::allocator >::find_last_not_of(char, unsigned long) const@GLIBCXX_3.4.21'
    /home/work/JXF/CPP/libtorch/lib/libtorch.so: undefined reference to `lgamma@GLIBC_2.23'
    /home/work/JXF/CPP/libtorch/lib/libtorch.so: undefined reference to `std::__cxx11::basic_string, std::allocator >::npos@GLIBCXX_3.4.21'
    

    在CentOS系统上进行测试的时候遇到很多错误,主要是C++ 11 ABI标准的问题。使用不同版本的编译器以及GLIBC结果也不尽相同。

    GCC 5 以后的版本,将std::string 与 std::list 从新实现了,对于c++03 与 c++11 来说, list 从原有的 std::list 变为了std::__cxx11::list 所以在链接的时候,为了兼容旧版本的代码,GCC 5 同时实现了两个版本,在编译的时候需要启用_GLIBCXX_USE_CXX11_ABI 宏来选择链接到哪个版本

    # 在GCC 中定义使用宏定义
    -D_GLIBCXX_USE_CXX11_ABI=0 // 链接到旧版本 std::list
    -D_GLIBCXX_USE_CXX11_ABI=1 // 链接到新版本 std::__cxx11::list
    # code 中使用宏定义
    #define _GLIBCXX_USE_CXX11_ABI 0
    #define _GLIBCXX_USE_CXX11_ABI 1
    

    我的CentOS系统为7.3
    默认gcc版本4.8.5
    GLIBC 版本2.17
    这里对于libtorch 1.4。务必下载Pre-cxx11 ABI版本。
    另外gcc需升级到GCC5以上,官方使用的是GCC5.4.0。

    如果是Libtorch1.2,则GCC4.9.2也可

  • 张量的基本操作

    例如定义一个10x10的随机张量

    torch::Tensor tensor = torch::rand({10,10});
    

    由于是一个二维数组,因此要取出任意元素的过程如下,如果要取出第i行第j列的元素。

    auto foo = tensor.accessor<float, 2>();
    std::cout << foo[i][j] << std::endl;
    

    获取一个张量的尺寸

    std::cout << foo.sizes() << std::endl;
    

    获取张量的切片

    std::cout << foo.slice(dim=0, start=0,end=3) << std::endl;
    

    将一维张量值转换为float/double数值。

    torch::Tensor tensor = torch::randn({3,4});
    std::cout << tensor[1][2].item()<double>() << std::endl;
    

    CPU->CUDA

    tensor.to(at::kCUDA)
    

    CUDA->CPU

    tensor.to(at::kCPU)
    
  • 模型返回值
    如果模型只有一个返回值,那么常用如下语句toTensor转换成张量

    at::Tensor result = module.forward({tensor_image}).toTensor();
    

    如果模型有多个返回值,那么则需要转换成Tuple.

    auto result = module.forward({tensor_image}).toTuple();
    at::Tensor loc= result->elements()[0].toTensor();
    at::Tensor conf = result->elements()[1].toTensor();
    
  • 加载模型文件

    对于trace好的模型,可以查看模型的trace

    traced.save('wrapped_rnn.zip')
    loaded = torch.jit.load('wrapped_rnn.zip')
    print(loaded)
    print(loaded.code)
    
  • 模型trace警告

    TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

    在推理的过程中,可能会设置一些后处理操作,设计到if语句,导致trace失败,原因个人认为是trace过程不接受分支型流程。相当于只能有一条路可走,if条件控制不满足此条件。因此最好将后处理操作单独用C++实现。

  • SSD模型trace失败,由于Detect模块使用了老版本的autograd,导致历史遗留问题,github上给出的解决方案是将Detect类改为一个函数,不从torch.autograd.Function继承就可以了。

你可能感兴趣的:(AI,C++,图像处理,libtorch,pytorch,c++,模型部署,深度学习)