利用onnx和onnxruntime实现pytorch深度框架使用C++推理进行服务器部署,模型推理的性能是比python快很多的
python:
pytorch == 1.6.0
onnx == 1.7.0
onnxruntime == 1.3.0
c++:
onnxruntime-linux-x64-1.4.0
首先,利用pytorch自带的torch.onnx
模块导出 .onnx
模型文件,具体查看该部分pytorch官方文档,主要流程如下:
import torch
checkpoint = torch.load(model_path)
model = ModelNet(params)
model.load_state_dict(checkpoint['model'])
model.eval()
input_x_1 = torch.randn(10,20)
input_x_2 = torch.randn(1,20,5)
output, mask = model(input_x_1, input_x_2)
torch.onnx.export(model,
(input_x_1, input_x_2),
'model.onnx',
input_names = ['input','input_mask'],
output_names = ['output','output_mask'],
opset_version=11,
verbose = True,
dynamic_axes={'input':{1,'seqlen'}, 'input_mask':{1:'seqlen',2:'time'},'output_mask':{0:'time'}})
torch.onnx.export
参数在文档里面都有,opset_version
对应的版本很重要,dynamic_axes
是对输入和输出对应维度可以进行动态设置,不设置的话输入和输出的Tensor 的 shape是不能改变的,如果输入固定就不需要加。
导出的模型可否顺利使用可以先使用python进行检测
import onnxruntime as ort
import numpy as np
ort_session = ort.InferenceSession('model.onnx')
outputs = ort_session.run(None,{'input':np.random.randn(10,20),'input_mask':np.random.randn(1,20,5)})
# 由于设置了dynamic_axes,支持对应维度的变化
outputs = ort_session.run(None,{'input':np.random.randn(10,5),'input_mask':np.random.randn(1,26,2)})
# outputs 为 包含'output'和'output_mask'的list
import onnx
model = onnx.load('model.onnx')
onnx.checker.check_model(model)
如果没有异常代表导出的模型没有问题,目前torch.onnx.export只能对部分支持的Tensor操作进行识别,详情参考Supported operators,对于包括transformer等基本的模型都是没有问题的,如果出现ATen等问题,你就需要对模型不支持的Tensor操作进行改进,以免影响C++对该模型的使用。
下面就是C++通过onnxruntime对.onnx
模型的使用,参考官方样例和常见问题写的模型多输入多输出的情况,部分参数可以参考样例或者查官方API文档:
#include
#include
#include
int main(int argc, char* argv[]) {
Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "test");
Ort::SessionOptions session_options;
session_options.SetIntraOpNumThreads(1);
session_options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_EXTENDED);
#ifdef _WIN32
const wchar_t* model_path = L"model.onnx";
#else
const char* model_path = "model.onnx";
#endif
Ort::Session session(env, model_path, session_options);
// print model input layer (node names, types, shape etc.)
Ort::AllocatorWithDefaultOptions allocator;
// print number of model input nodes
size_t num_input_nodes = session.GetInputCount();
std::vector input_node_names = {"input","input_mask"};
std::vector output_node_names = {"output","output_mask"};
std::vector input_node_dims = {10, 20};
size_t input_tensor_size = 10 * 20;
std::vector input_tensor_values(input_tensor_size);
for (unsigned int i = 0; i < input_tensor_size; i++)
input_tensor_values[i] = (float)i / (input_tensor_size + 1);
// create input tensor object from data values
auto memory_info = Ort::MemoryInfo::CreateCpu(OrtArenaAllocator, OrtMemTypeDefault);
Ort::Value input_tensor = Ort::Value::CreateTensor(memory_info, input_tensor_values.data(), input_tensor_size, input_node_dims.data(), 2);
assert(input_tensor.IsTensor());
std::vector input_mask_node_dims = {1, 20, 4};
size_t input_mask_tensor_size = 1 * 20 * 4;
std::vector input_mask_tensor_values(input_mask_tensor_size);
for (unsigned int i = 0; i < input_mask_tensor_size; i++)
input_mask_tensor_values[i] = (float)i / (input_mask_tensor_size + 1);
// create input tensor object from data values
auto mask_memory_info = Ort::MemoryInfo::CreateCpu(OrtArenaAllocator, OrtMemTypeDefault);
Ort::Value input_mask_tensor = Ort::Value::CreateTensor(mask_memory_info, input_mask_tensor_values.data(), input_mask_tensor_size, input_mask_node_dims.data(), 3);
assert(input_mask_tensor.IsTensor());
std::vector ort_inputs;
ort_inputs.push_back(std::move(input_tensor));
ort_inputs.push_back(std::move(input_mask_tensor));
// score model & input tensor, get back output tensor
auto output_tensors = session.Run(Ort::RunOptions{nullptr}, input_node_names.data(), ort_inputs.data(), ort_inputs.size(), output_node_names.data(), 2);
// Get pointer to output tensor float values
float* floatarr = output_tensors[0].GetTensorMutableData();
float* floatarr_mask = output_tensors[1].GetTensorMutableData();
printf("Done!\n");
return 0;
}
编译命令 g++ infer.cpp -o infer onnxruntime-linux-x64-1.4.0/lib/libonnxruntime.so.1.4.0 -Ionnxruntime-linux-x64-1.4.0/include/ -std=c++11
onnxruntime中Tensor支持的数据类型包括:
typedef enum ONNXTensorElementDataType {
ONNX_TENSOR_ELEMENT_DATA_TYPE_UNDEFINED,
ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT, // maps to c type float
ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT8, // maps to c type uint8_t
ONNX_TENSOR_ELEMENT_DATA_TYPE_INT8, // maps to c type int8_t
ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT16, // maps to c type uint16_t
ONNX_TENSOR_ELEMENT_DATA_TYPE_INT16, // maps to c type int16_t
ONNX_TENSOR_ELEMENT_DATA_TYPE_INT32, // maps to c type int32_t
ONNX_TENSOR_ELEMENT_DATA_TYPE_INT64, // maps to c type int64_t
ONNX_TENSOR_ELEMENT_DATA_TYPE_STRING, // maps to c++ type std::string
ONNX_TENSOR_ELEMENT_DATA_TYPE_BOOL,
ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT16,
ONNX_TENSOR_ELEMENT_DATA_TYPE_DOUBLE, // maps to c type double
ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT32, // maps to c type uint32_t
ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT64, // maps to c type uint64_t
ONNX_TENSOR_ELEMENT_DATA_TYPE_COMPLEX64, // complex with float32 real and imaginary components
ONNX_TENSOR_ELEMENT_DATA_TYPE_COMPLEX128, // complex with float64 real and imaginary components
ONNX_TENSOR_ELEMENT_DATA_TYPE_BFLOAT16 // Non-IEEE floating-point format based on IEEE754 single-precision
} ONNXTensorElementDataType;
其中需要注意的是使用bool型,需要从uint_8的vector转为bool型:
std::vector mask_tensor_values;
for(int i = 0; i < mask_tensor_size; i++){
mask_tensor_values.push_back((uint8_t)(true));
}
auto mask_memory_info = Ort::MemoryInfo::CreateCpu(OrtArenaAllocator, OrtMemTypeDefault);
Ort::Value mask_tensor = Ort::Value::CreateTensor(mask_memory_info, reinterpret_cast(mask_tensor_values.data()),mask_tensor_size, mask_node_dims.data(), 3);
实际情况粗略统计,以transformer为例,onnxruntime-c++上的运行效率要比pytorch-python快2-5倍