
平台: Tesla P4,x86


  • ubuntu16.04
  • cuda9.0
  • cudnn7.5
  • python3.6
  • tensorflow-gou1.12
  • tensorrt5.1.5.0
  • onnx1.4.1(高版本会出现bug)
  • pycuda2019-1.12

流程: yolov3-tiny ----> tiny.onnx --->tiny.trt


  layer   filters  size/strd(dil)      input                output
   0 conv     16       3 x 3/ 1    416 x 416 x   3 ->  416 x 416 x  16 0.150 BF
   1 max               2 x 2/ 2    416 x 416 x  16 ->  208 x 208 x  16 0.003 BF
   2 conv     32       3 x 3/ 1    208 x 208 x  16 ->  208 x 208 x  32 0.399 BF
   3 max               2 x 2/ 2    208 x 208 x  32 ->  104 x 104 x  32 0.001 BF
   4 conv     64       3 x 3/ 1    104 x 104 x  32 ->  104 x 104 x  64 0.399 BF
   5 max               2 x 2/ 2    104 x 104 x  64 ->   52 x  52 x  64 0.001 BF
   6 conv    128       3 x 3/ 1     52 x  52 x  64 ->   52 x  52 x 128 0.399 BF
   7 max               2 x 2/ 2     52 x  52 x 128 ->   26 x  26 x 128 0.000 BF
   8 conv    256       3 x 3/ 1     26 x  26 x 128 ->   26 x  26 x 256 0.399 BF
   9 max               2 x 2/ 2     26 x  26 x 256 ->   13 x  13 x 256 0.000 BF
  10 conv    512       3 x 3/ 1     13 x  13 x 256 ->   13 x  13 x 512 0.399 BF
  11 max               2 x 2/ 1     13 x  13 x 512 ->   13 x  13 x 512 0.000 BF
  12 conv   1024       3 x 3/ 1     13 x  13 x 512 ->   13 x  13 x1024 1.595 BF
  13 conv    256       1 x 1/ 1     13 x  13 x1024 ->   13 x  13 x 256 0.089 BF
  14 conv    512       3 x 3/ 1     13 x  13 x 256 ->   13 x  13 x 512 0.399 BF
  15 conv     18       1 x 1/ 1     13 x  13 x 512 ->   13 x  13 x  18 0.003 BF
  16 yolo
[yolo] params: iou loss: mse, iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00
  17 route  13
  18 conv    128       1 x 1/ 1     13 x  13 x 256 ->   13 x  13 x 128 0.011 BF
  19 upsample                 2x    13 x  13 x 128 ->   26 x  26 x 128
  20 route  19 8
  21 conv    256       3 x 3/ 1     26 x  26 x 384 ->   26 x  26 x 256 1.196 BF
  22 conv     18       1 x 1/ 1     26 x  26 x 256 ->   26 x  26 x  18 0.006 BF
  23 yolo
[yolo] params: iou loss: mse, iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00



  1. 下载tensorrt5.1.5.0并解压,https://developer.nvidia.com/nvidia-tensorrt-5x-download#trt51ga,推荐稳定版本的GA tar包

  2. 解压后安装tensorrt

    tar -xzvf TensorRT-
    cd TensorRT-
    pip install tensorrt-
  3. yolo转onnx项目在TensorRT-下,yolov3_to_onnx.py实在python2环境下,使用python3仅需要修改几行代码,此外由于yolov3里面没有maxpool操作,如果使用yolov3-tiny还需要将maxpool操作添加进去

  • python3 修改 parse_cfg_file函数代码,添加 remainder = remainder.decode('utf-8'),位置见下方代码

def parse_cfg_file(self, cfg_file_path):
    """Takes the yolov3.cfg file and parses it layer by layer,
    appending each layer's parameters as a dictionary to layer_configs.

    Keyword argument:
    cfg_file_path -- path to the yolov3.cfg file as string
    with open(cfg_file_path, 'rb') as cfg_file:
         remainder = cfg_file.read()
         remainder = remainder.decode('utf-8')##添加代码,使其在python3运行
         while remainder is not None:
             layer_dict, layer_name, remainder = self._next_layer(remainder)
             if layer_dict is not None:
                 self.layer_configs[layer_name] = layer_dict
    return self.layer_configs
  • 添加maxpool操作,整个函数添加在 class GraphBuilderONNX(object)类中:

def _make_maxpool_node(self, layer_name, layer_dict):
    stride = layer_dict['stride']
    kernel_size = layer_dict['size']
    previous_node_specs = self._get_previous_node_specs()
    inputs = [previous_node_specs.name]
    channels = previous_node_specs.channels
    kernel_shape = [kernel_size, kernel_size]
    strides = [stride, stride]
    assert channels > 0
    maxpool_node = helper.make_node(
        strides = strides,
        auto_pad = 'SAME_UPPER',
    return  layer_name, channels
  • 并在maxpool操作调用添加main()函数中:
#supported_layers = ['net', 'convolutional', 'shortcut', 'route', 'upsample']
supported_layers = ['net', 'convolutional', 'shortcut', 'route', 'upsample', 'maxpool']
  •  在main()函数中修改自己的cfg、weights及保存的onnx文件路径:
cfg_file_path = "yolov3-tiny.cfg"
weights_file_path = 'yolov3-tiny.weights'
output_file_path = 'yolov3-tiny.onnx'

  • 在main()函数中修改输出节点名称及维度,这里的18表示输出(5+nclass)*3=(5+1)*3=18,因为我的模型类别为1,可以根据自身模型class数目进行修改。output_tensor_dims['016_convolutional']及output_tensor_dims['023_convolutional']中的输出节点层数为darknet下yolo层数,可进行相应修改
img_size = 416
kernel_size_1 = int(img_size/32)
kernel_size_2 = int(img_size/16)
# output_tensor_dims['082_convolutional'] = [255, 19, 19] ##yolov3 output_tensor_dims['094_convolutional'] = [255, 38, 38] ##yolov3
# output_tensor_dims['106_convolutional'] = [255, 76, 76] ##yolov3
output_tensor_dims['016_convolutional'] = [18, kernel_size_1, kernel_size_1] ##yolov3-tiny
output_tensor_dims['023_convolutional'] = [18, kernel_size_2, kernel_size_2] ##yolov3-tiny

4.  最后运行yolov3_to_onnx.py,生成.onnx,到此yolo转化onnx结束,接下来是onnx转tensorrt

5. 修改onnx_to_tensorrt.py部分代码: onnx及trt路径,output_shapes、yolo层的anchor

  • 修改.onnx、.trt及测试图片路径
# onnx_file_path = 'yolov3.onnx'
# engine_file_path = "yolov3.trt"
# Download a dog image and save it to the following file path:
# input_image_path = download_file('dog.jpg',
#     'https://github.com/pjreddie/darknet/raw/f86901f6177dfc6116360a13cc06ab680e0c86b0/data/dog.jpg', checksum_reference=None)
onnx_file_path = 'yolov3-tiny.onnx'
engine_file_path = 'yolov3-tiny.trt'
input_image_path = 'dog.jpg'
  • 修改输入input_resolution_yolov3_HW,输出output_shape的值: 18=3*(nclass+5)
#input_resolution_yolov3_HW = (608, 608)
input_resolution_yolov3_HW = (416, 416) ##输入图片大小
# output_shapes = [(1, 255, 19, 19), (1, 255, 38, 38), (1, 255, 76, 76)]
output_shapes = [(1, 18, 13, 13), (1, 18, 26, 26)] ##yolov3-tiny输出维度
  • 修改yolo输出层信息
postprocessor_args = {"yolo_masks": [(6, 7, 8), (3, 4, 5), (0, 1, 2)],                    # A list of 3 three-dimensional tuples for the YOLO masks
                  "yolo_anchors": [(10, 13), (16, 30), (33, 23), (30, 61), (62, 45),  # A list of 9 two-dimensional tuples for the YOLO anchors
                                   (59, 119), (116, 90), (156, 198), (373, 326)],
                  "obj_threshold": 0.6,                                               # Threshold for object coverage, float value between 0 and 1
                  "nms_threshold": 0.5,                                               # Threshold for non-max suppression algorithm, float value between 0 and 1
                  "yolo_input_resolution": input_resolution_yolov3_HW}

postprocessor_args = {"yolo_masks": [(3, 4, 5), (0, 1, 2)],
                  # A list of 2 three-dimensional tuples for the YOLOv3-tiny masks
                  "yolo_anchors": [(10, 14), (23, 27), (37, 58), (81, 82), (135, 169),(344,319)],
                                   # A list of 6 two-dimensional tuples for the YOLOv3-tiny anchors
                  "obj_threshold": 0.6,  # Threshold for object coverage, float value between 0 and 1
                  "nms_threshold": 0.5,
                  # Threshold for non-max suppression algorithm, float value between 0 and 1
                  "yolo_input_resolution": input_resolution_yolov3_HW}

6. 修改data_processing.py标签文件的代码: LABEL_FILE_PATH

#LABEL_FILE_PATH = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'coco_labels.txt')
LABEL_FILE_PATH = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'voc.txt')  ##改为自己的label文件

7.  最后一步,运行onnx_to_tensorrt.py,成功生成trt文件,加速效果肉眼可见!!!



  1. 由于onnx版本过高引起的bug
File "yolov3_to_onnx.py", line 833, in main
  File "/usr/local/lib/python3.6/dist-packages/onnx/checker.py", line 93, in check_model
onnx.onnx_cpp2py_export.checker.ValidationError: Op registered for Upsample is deprecated in domain_version of 12

==> Context: Bad node spec: input: "042_convolutional_lrelu" input: "043_upsample_scale" output: "043_upsample" name: "043_upsample" op_type: "Upsample" attribute { name: "mode" s: "nearest" type: STRING }


pip uninstall onnx
pip install onnx=1.4.1 -i https://pypi.tuna.tsinghua.edu.cn/simple

   2. tensorrt安装配置不成功出现的问题

Traceback (most recent call last):
File "", line 1, in 
File "/usr/local/lib/python3.6/dist-packages/tensorrt/__init__.py", line 1, in 
    from .tensorrt import *
ImportError: libnvonnxparser.so.0: cannot open shared object file: No such file or directory


 cp TensorRT-* /usr/lib/

  3. 安装pycuda模块



