在部署pytorch模型时候,使用C++ API能有更高的效率,本文记录使用C++ API部署一个图像分类模型的过程。
首先需要将pytorch模型转化为Torch Script,Torch Script是PyTorch模型的一种表示,可以被Torch Script编译器理解,编译和序列化。用torch script把torch模型转成c++接口可读的模型有两种方式:Tracing && Annotation. tracing比Annotation简单,但只适合结构固定的网络模型,即forward中没有控制流的情况,因为Tracing只会保存运行时实际走的路径。如果forward函数中有控制流,需要用Annotation方式实现。
本文采用Tracing的方式进行模型转换,tracing顾名思义,就是沿着数据运算的路径走一遍。
import torch
model = torch.load('./weights/best_resnet.pkl', map_location="cuda:0")
model.cuda()
# 使用 torch.jit.trace 生成 torch.jit.ScriptModule 来跟踪
x = torch.rand(1, 3, 224, 224)
x = x.cuda() # very important
traced_script_module = torch.jit.trace(model, x)
将ScriptModule序列化后才可以在c++中顺利的读取模型,而且在这个过程中不需要任何python依赖。
traced_script_module.save("resnet.pt")
得到的 .pt文件即转换后的模型文件,可以直接在C++环境中使用,不用依赖于任何python环境。
使用 torch::jit::load()加载模型。
#include // One-stop header.
#include
#include
int main(int argc, const char* argv[]) {
if (argc != 2) {
std::cerr << "usage: example-app \n" ;
return -1;
}
// Deserialize the ScriptModule from a file using torch::jit::load().
std::shared_ptr<torch::jit::script::Module> module = torch::jit::load(argv[1]);
assert(module != nullptr);
std::cout << "ok\n";
}
#include
#include "torch/script.h"
#include "torch/torch.h"
#include "opencv2/core.hpp"
#include "opencv2/imgproc.hpp"
#include "opencv2/highgui.hpp"
#include
#include
#include
#include
#include
#define CLK_TCK 18.2
using namespace std;
using namespace cv;
int main()
{
// load model
torch::DeviceType device_type;
device_type = torch::kCPU;
if (torch::cuda::is_available())
{
device_type = torch::kCUDA;
}
else
{
device_type = torch::kCPU;
}
torch::Device device(device_type);
torch::jit::script::Module module = torch::jit::load("./resnet.pt");
module.to(device);
std::cout<<"load model success"<<std::endl;
double time0=static_cast<double>(getTickCount());
for (int k=0; k<1000; k++){
Mat img = imread("xxxxxx/video_down/wangzhe/240.jpg");
int img_size = 224;
Mat img_resized = img.clone();
resize(img, img_resized,Size(img_size, img_size));
Mat img_float;
img_resized.convertTo(img_float, CV_32F, 1.0f / 255.0f); //归一化到[0,1]区间
auto tensor_image = torch::from_blob(img_float.data, {1, img_size, img_size, 3}, torch::kFloat32); //对于一张图而言可使用此函数将nhwc格式转换成tensor
tensor_image = tensor_image.permute({0, 3, 1, 2});//调整通道顺序,将nhwc转换成nchw
tensor_image[0][0] = tensor_image[0][0].sub_(0.485).div_(0.229);
tensor_image[0][1] = tensor_image[0][1].sub_(0.456).div_(0.224);
tensor_image[0][2] = tensor_image[0][2].sub_(0.406).div_(0.225);
tensor_image = tensor_image.to(at::kCUDA); //将tensor放进GPU中处理
torch::Tensor out_tensor = module.forward({tensor_image}).toTensor(); //前向计算
auto results = out_tensor.sort(-1, true);
auto softmaxs = std::get<0>(results)[0].softmax(0);
auto indexs = std::get<1>(results)[0];
auto idx = indexs[0].item<int>();
string labels[2] = {"normal", "pk"};
string label = labels[idx];
float confidence = softmaxs[0].item<float>() * 100.0f;
cout<<"label:"<<label<<" confidence:"<<confidence<<endl;
}
time0=((double)getTickCount()-time0)/getTickFrequency();
cout << "time consume: " << time0 << endl;
return 0;
}
和使用python进行预测相比,比较麻烦一点的是需要自己进行数据的预处理,处理方式要和训练时候保持一致,而在python预测中只需要调用transform类就可以进行处理了。
cmake_minimum_required(VERSION 3.2 FATAL_ERROR)
project(Classify_cpp)
# 设置Opencv的CMake路径
set(OpenCV_DIR /usr/local/share/OpenCV)
find_package (OpenCV REQUIRED NO_CMAKE_FIND_ROOT_PATH)
if(OpenCV_FOUND)
INCLUDE_DIRECTORIES(${OpenCV_INCLUDE_DIRS})
message(STATUS "OpenCV library status:")
message(STATUS " version: ${OpenCV_VERSION}")
message(STATUS " libraries: ${OpenCV_LIBS}")
message(STATUS " include path: ${OpenCV_INCLUDE_DIRS}")
endif()
set(CMAKE_PREFIX_PATH xxx/anaconda3/envs/pytorch2/lib/python3.6/site-packages/torch)
find_package(Torch REQUIRED)
#设置编译器版本
SET(CMAKE_C_COMPILER g++)
if(CMAKE_COMPILER_IS_GNUCXX)
add_compile_options(-std=c++11 -fno-stack-protector) # very important key in TK1,otherwise will raise an error call stack smashing detected
message(STATUS "optional:-std=c++11")
endif(CMAKE_COMPILER_IS_GNUCXX)
add_executable(${PROJECT_NAME} classify.cpp)
target_link_libraries(${PROJECT_NAME} ${TORCH_LIBRARIES} ${OpenCV_LIBS})
SET(CMAKE_BUILD_TYPE DEBUG)
mkdir build
cmake ..
make
或者也可以在QT中打开CMakeLists.txt文件,以便于自己调试、修改代码。特别注意的是CMakeLists.txt文件中最后添加一句"SET(CMAKE_BUILD_TYPE DEBUG)",才能够在QT中进行代码调试。
这里使用ResNet50做一个二分类计算1000张图像,使用Python API耗时18秒,C++ API耗时15秒,能够节省15%的时间消耗,更利于模型的部署。
github: Image_Classify_cpp