双目里比较优秀的很多模型都有用到torch的grid_sample,但是其在TensorRT中没有接口。虽说4D的有替代方案,但是还是有性能损失。并且我最近测试了BGNET,感觉在实时方案中是最可靠的,在抖动中不会大量误匹配,但是其有双边滤波是升维的,变成了5D的grid_sample,让人头大。替代写法网上都没有,我自己照着实现了一下,结果是一致的,但是时间直接翻一倍,我人裂开,onnx部署时间和torch原来跑的时间一致就很难受。思来想去还是试试能不能怼个接口上去。
Jetson Xavier NX
经过一些复杂尝试,首先试了下torch2trt,感觉细节上容易对不上他的接口要求,要排查的太多了,放弃。
觉得在NX上编译源码TensorRT源码还是不太合适,在一顿搜索尝试之后确定了以下方案。
import torch
from my_model import my_model
import typing
from torch.onnx import symbolic_helper
_OPSET_VERSION = 11
_registered_ops: typing.AbstractSet[str] = set()
def _reg(symbolic_fn: typing.Callable):
name = "::%s" % symbolic_fn.__name__
torch.onnx.register_custom_op_symbolic(name, symbolic_fn, _OPSET_VERSION)
_registered_ops.add(name)
def register():
"""Register ONNX Runtime's built-in contrib ops.
Should be run before torch.onnx.export().
"""
def grid_sampler(g, input, grid, mode, padding_mode, align_corners):
# mode
# 'bilinear' : onnx::Constant[value={0}]
# 'nearest' : onnx::Constant[value={1}]
# 'bicubic' : onnx::Constant[value={2}]
# padding_mode
# 'zeros' : onnx::Constant[value={0}]
# 'border' : onnx::Constant[value={1}]
# 'reflection' : onnx::Constant[value={2}]
mode = symbolic_helper._maybe_get_const(mode, "i")
padding_mode = symbolic_helper._maybe_get_const(padding_mode, "i")
mode_str = ["bilinear", "nearest", "bicubic"][mode]
padding_mode_str = ["zeros", "border", "reflection"][padding_mode]
align_corners = int(symbolic_helper._maybe_get_const(align_corners, "b"))
# From opset v13 onward, the output shape can be specified with
# (N, C, H, W) (N, H_out, W_out, 2) => (N, C, H_out, W_out)
# input_shape = input.type().sizes()
# gird_shape = grid.type().sizes()
# output_shape = input_shape[:2] + gird_shape[1:3]
# g.op(...).setType(input.type().with_sizes(output_shape))
return g.op(
## op name, modify here. not sure whether "com.microsoft::" is required
"com.microsoft::GridSamplePluginDynamic",
input,
grid,
mode_s=mode_str,
padding_mode_s=padding_mode_str,
align_corners_i=align_corners,
)
_reg(grid_sampler)
@torch.no_grad()
def convert():
register()
# set cpu
device = "cuda"
model = my_model (88, 'models.pth').to(device)
model.eval()
t1 = torch.rand(1, 1, 384, 640).to(device)
t2 = torch.rand(1, 1, 384, 640).to(device)
# Export the model
torch.onnx.export(model,
(t1, t2),
'model.onnx', # where to save the model (can be a file or file-like object)
export_params=True, # store the trained parameter weights inside the model file
opset_version=11, # the ONNX version to export the model to
do_constant_folding=True, # whether to execute constant folding for optimization
input_names = ['left', 'right'], # the model's input names
output_names = ['output'])
if __name__ == "__main__":
convert()
因为我要使用的grid_sample是5D的,就算用最新的onnx导出也不支持,所以我还是用老版本的onnx导出。4D的我不知道直接用高版本导出会有啥问题哈。之后有机会试试CREStereo的。
python3 -m onnxsim model.onnx model_sim.onnx
/usr/src/tensorrt/bin/trtexec --onnx=model.onnx --saveEngine=model.trt --fp16 \
--plugins=/home/ubuntu/Documents/amirstan_plugin/build/lib/libamirstan_plugin.so
这里我一开始的时候链接上但是接口还是对不上,改接口工程里面的名字好像没变换,想了想算了把onnx模型里面算子type改成GridSamplePluginDynamic了。
#include
......
string dll_path = "/home/ubuntu/Documents/amirstan_plugin/build/lib/libamirstan_plugin.so";
void *handle = dlopen(dll_path.c_str(), RTLD_LAZY);
if (NULL == handle)
{
printf("dlopen error. msg:%s", dlerror());
return -1;
}
......
同时在CMakeLists里面写上
target_link_libraries(main ${CMAKE_DL_LIBS} )
python实现个人感觉可以参考这里,使用ctypes.CDLL(plugin_lib)来链接上库,其他一致。不确定哈,还没试过。