在Market1501训练集上训练了一个用于行人属性检测的ResNet50网络,发现在GTX1080Ti上推理一张行人图片所耗费的时间超过240ms,显然远远满足不了实时性要求,遂决定利用TensortRT加速模型推理。
这一部分比较简单,大致照着PyTorch官网的例程走即可。
# Some standard imports
import io
import numpy as np
from torch import nn
import torch.utils.model_zoo as model_zoo
import torch.onnx
import torchvision
from ResNet50_nFC import ResNet50_nFC
torch_model = ResNet50_nFC(30) # 网络的输出是30种行人的属性
torch_model.load_state_dict(torch.load('net_last.pth'))
torch_model.cuda()
torch_model.train(False)
# print(torch_model)
dummy_input = torch.randn(1, 3, 288, 144, requires_grad=True, device='cuda')
dummy_output = torch_model(dummy_input)
# Export the model
torch.onnx.export(torch_model,
dummy_input,
"ResNet50_nFC.onnx",
verbose=True)
https://github.com/onnx/onnx-tensorrt
安装步骤根据官网指示走:
$ mkdir build
$ cd build
$ cmake .. -DTENSORRT_ROOT=<tensorrt_install_dir> -DGPU_ARCHS="75"
$ make -j8
$ sudo make install
- 注:-DGPU_ARCHS="75"要根据显卡来设置
RTX2060-DGPU_ARCHS="75"
GTX1080-DGPU_ARCHS="61"
具体的参数可以查阅:https://developer.nvidia.com/cuda-gpus- 这里可能还要安装Protobuf,安装过程大致为:
1、下载protobuf代码 https://github.com/protocolbuffers/protobuf/releases
2、安装protobuf$ tar -xvf protobuf $ cd protobuf $ ./configure --prefix=/usr/local/protobuf $ make $ make check $ make install
查看protoc版本:
$ protoc --version
在cmake的过程中,还遇到了以下问题:
CMake Error at CMakeLists.txt:121 (add_subdirectory):
The source directory
/home/xxx/Downloads/onnx-tensorrt-master/third_party/onnx
does not contain a CMakeLists.txt file.
原因是github上下载项目的时候,没有把/onnx-tensorrt-master/third_party/onnx/中的包含的onnx库的东西下载下来,手动下载并复制到该路径下即可。
安装完成后,输入转换指令即可:
$ onnx2trt ResNet50_nFC.onnx -o ResNet50_nFC.trt
然而事情并没有这么简单,这里又遇到了Error:
While parsing node number 175 [Gather -> "764"]:
ERROR: /home/xfb/Projects/ModelConvert/onnx-tensorrt/onnx2trt_utils.hpp:399 In function convert_axis:
[8] Assertion failed: axis >= 0 && axis < nbDims
搜索错误相关信息,问题可能是:TensorRT无法实现PyTorch中某些操作,即使转换成ONNX后也依旧无法执行。
受https://github.com/pytorch/pytorch/issues/16908的启发,修改torchvision中resnet.py
的源代码。将ResNet
类的forward
函数修改如下:
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.avgpool(x)
# x = x.reshape(x.size(0), -1) 修改这里
x = x.reshape(1, -1)
x = self.fc(x)
重新将PyTorch模型转换成ONNX,然后再转换成TensorRT,终于成功了!
----------------------------------------------------------------
Input filename: ResNet50_nFC.onnx
ONNX IR version: 0.0.4
Opset version: 9
Producer name: pytorch
Producer version: 1.1
Domain:
Model version: 0
Doc string:
----------------------------------------------------------------
WARNING: ONNX model has a newer ir_version (0.0.4) than this parser was built against (0.0.3).
Parsing model
Building TensorRT engine, FP16 available:1
Max batch size: 32
Max workspace size: 1024 MiB
Writing TensorRT engine to 1.trt
All done