DeepStream5.0系列之Triton部署模型

点击查看系列文章目录

0 背景

deepstream 官方文档介绍,支持与 Triton Server 对接,把 nvinfer 的部分放到 triton 上来实现,即 nvinferserver,那么这种方式可以解决什么问题呢?

首先,triton server 支持更多种类的模型,如 pytorch、tensorflow、tensorrt 等,而 nvinfer 只专注于做 tensorrt 模型的集成,这对于模型的开发调试有利;但是如果在实际生产环境中,还是建议使用 nvinfer 插件,因为这种方式更高效,对内存、显存等利用率高,速度更快些。总之,deepstream 结合 triton 具备以下特点:

  • 支持更多格式的模型,在 T4 和 Jetson 平台上都支持 tensorrt、tensorflow graphdef and savedmodel、tensorflow-tensorrt model,而 T4 还支持 onnx、pytorch、caffe2 netdef格式
  • 支持单张 GPU 上运行多个模型,或者说相同模型的多个实例(多个版本)

本文对 deepstream 与 triton 的结合部署方法做一个介绍。

官方教程:https://developer.nvidia.com/blog/building-iva-apps-using-deepstream-5-0-updated-for-ga/

github:https://github.com/NVIDIA-AI-IOT/deepstream_triton_model_deploy

DeepStream5.0系列之Triton部署模型_第1张图片

1 Jetson 环境

在 jetson 上部署的时候,triton 相关的动态库已经作为 deepstream sdk 的一部分安装好了,因此可以直接调用,可以通过 gst-inspect-1.0 工具查看插件相关信息

$ gst-inspect-1.0 nvinferserver
2021-01-04 15:43:06.093014: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
Factory Details:
  Rank                     primary (256)
  Long-name                NvInferServer plugin
  Klass                    NvInferServer Plugin
  Description              Nvidia DeepStreamSDK TensorRT plugin
  Author                   NVIDIA Corporation. Deepstream for Tesla forum: https://devtalk.nvidia.com/default/board/209

Plugin Details:
  Name                     nvdsgst_inferserver
  Description              NVIDIA DeepStreamSDK TensorRT Inference Server plugin
  Filename                 /usr/lib/aarch64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_inferserver.so
  Version                  5.0.0
  License                  Proprietary
  Source module            nvinferserver
  Binary package           NVIDIA DeepStreamSDK TensorRT Inference Server plugin
  Origin URL               http://nvidia.com/

首先准备模型

$ cd $DEEPSTREAM_DIR/deepstream-5.0/samples
$ ./prepare_ds_trtis_model_repo.sh

会在 samples 路径下生成 trtis_model_repo 文件夹,在里边下载 triton 调用的模型,我们以目标检测 ssd_mobilenet_v1 为例介绍,文件目录结构如下

└── ssd_mobilenet_v1_coco_2018_01_28
    ├── 1
    │   └── frozen_inference_graph.pb
    ├── config.pbtxt
    └── labels.txt

其中,config.pbtxt 定义了模型相关的信息,其含义及设置方法参考链接,labels.txt 是类别信息,1 目录下的 frozen_inference_graph.pb 表示版本号为 1 的 tensorflow pb 模型

然后,我们就可以运行 configs/deepstream-app-trtis 目录下的配置文件,我们需要用 deepstream-app 来运行一个 app 的配置文件 source1_primary_detector_nano.txt,定义了 pipeline 的各个插件,包括 source、sink、osd、streammux 等,这些和正常的 pipeline  没有区别,主要是 primary-gie 的配置如下

[primary-gie]
enable=1
#(0): nvinfer; (1): nvinferserver
plugin-type=1
#infer-raw-output-dir=trtis-output
batch-size=1
interval=0
gie-unique-id=1
config-file=config_infer_primary_detector_ssd_mobilenet_v1_coco_2018_01_28.txt

要指定使用 nvinferserver 插件来推理,同时指定配置文件 config-file=config_infer_primary_detector_ssd_mobilenet_v1_coco_2018_01_28.txt,在这个配置文件中,要指定前处理、后处理、模型信息等内容,其中后处理调用的是 /opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_infercustomparser.so 动态库,函数入口是 NvDsInferParseCustomTfSSD,你也可以修改 $DEEPSTREAM_DIR/sources/libs/nvdsinfer_customparser/ 下边的源码,来自定义自己模型的后处理过程。

两个配置文件设好之后,使用 deepstream-app 来调用,输出如下

$ deepstream-app -c source1_primary_detector_nano.txt 

(deepstream-app:2197): GLib-GObject-WARNING **: 16:03:14.898: g_object_set_is_valid_property: object class 'GstNvStreamMux' has no property named 'attach-sys-ts'
2021-01-04 16:03:15.510845: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
0:00:00.895476736  2197   0x7f2c002360 WARN           nvinferserver gstnvinferserver_impl.cpp:248:validatePluginConfig: warning: Configuration file unique-id reset to: 1
I0104 08:03:15.695353 2197 server.cc:120] Initializing Triton Inference Server
W0104 08:03:15.700003 2197 autofill.cc:245] Autofiller failed to retrieve model. Error Details: Internal: unable to autofill for 'ssd_mobilenet_v1_coco_2018_01_28', unable to find graphdef file named 'model.graphdef'
W0104 08:03:15.700073 2197 autofill.cc:251] Proceeding with simple config for now
I0104 08:03:15.700221 2197 server_status.cc:55] New status tracking for model 'ssd_mobilenet_v1_coco_2018_01_28'
I0104 08:03:15.700324 2197 model_repository_manager.cc:680] loading: ssd_mobilenet_v1_coco_2018_01_28:1
I0104 08:03:15.700867 2197 base_backend.cc:176] Creating instance ssd_mobilenet_v1_coco_2018_01_28_0_0_gpu0 on GPU 0 (7.2) using frozen_inference_graph.pb
2021-01-04 16:03:15.751817: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2021-01-04 16:03:15.752817: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x98da450 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-01-04 16:03:15.752894: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-01-04 16:03:15.753097: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2021-01-04 16:03:15.753356: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2021-01-04 16:03:15.753511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.377
pciBusID: 0000:00:00.0
2021-01-04 16:03:15.753558: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2021-01-04 16:03:15.753632: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-01-04 16:03:15.756360: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-01-04 16:03:15.757063: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2021-01-04 16:03:15.760397: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2021-01-04 16:03:15.762845: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2021-01-04 16:03:15.762984: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2021-01-04 16:03:15.763126: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2021-01-04 16:03:15.763305: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2021-01-04 16:03:15.763404: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2021-01-04 16:03:17.807271: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-04 16:03:17.807389: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0 
2021-01-04 16:03:17.807448: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N 
2021-01-04 16:03:17.807765: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2021-01-04 16:03:17.808003: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2021-01-04 16:03:17.808197: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2021-01-04 16:03:17.808350: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3164 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
2021-01-04 16:03:17.812900: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7ec4d39930 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-01-04 16:03:17.812991: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Xavier, Compute Capability 7.2
I0104 08:03:18.119443 2197 model_repository_manager.cc:837] successfully loaded 'ssd_mobilenet_v1_coco_2018_01_28' version 1
INFO: TrtISBackend id:1 initialized model: ssd_mobilenet_v1_coco_2018_01_28
2021-01-04 16:03:23.412157: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2021-01-04 16:03:25.404889: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10

Runtime commands:
	h: Print this help
	q: Quit

	p: Pause
	r: Resume

NOTE: To expand a source in the 2D tiled display and view object details, left-click on the source.
      To go back to the tiled display, right-click anywhere on the window.


**PERF:  FPS 0 (Avg)	
**PERF:  0.00 (0.00)	
** INFO: : Pipeline ready

WARNING from primary_gie: Configuration file unique-id reset to: 1
Debug info: /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinferserver/gstnvinferserver_impl.cpp(248): validatePluginConfig (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInferServer:primary_gie
Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 261 
NVMEDIA: Reading vendor.tegra.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 261 
** INFO: : Pipeline running

**PERF:  10.65 (10.47)	
**PERF:  10.64 (10.64)	

2 deepstream 测试

x86 环境下需要通过 docker 来运行,在 NGC 上下载 nvcr.io/nvidia/deepstream:5.0-20.04-triton,后续的操作类似

3 部署 TensorFlow Model Zoo 模型

上边的方法是我们调用官方提供的模型进行测试,如果我们使用 TensorFlow 开发的话,也可以自己将 model zoo 中的模型,经过自己训练,参考《TensorFlow之目标检测API接口调试(超详细)》,然后部署到 deepstream 中。

官方教程:https://forums.developer.nvidia.com/t/deploying-models-from-tensorflow-model-zoo-using-nvidia-deepstream-and-nvidia-triton-inference-server/155682

我们以 model zoo 中的 faster_rcnn_inception_v2_coco_2018_01_28 model 为例进行介绍,需要经过 5 个步骤

  1. 下载预训练模型和配置文件
  2. 生成 triton 配置文件
  3. 生成 deepstream 配置文件
  4. 创建后处理动态库
  5. 运行 deepstream app

3.1 模型准备

cd $DEEPSTREAM_DIR/samples/trtis_model_repo 
wget http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_v2_coco_2018_01_28.tar.gz
tar xvf faster_rcnn_inception_v2_coco_2018_01_28.tar.gz 
mkdir faster_rcnn_inception_v2 && cd faster_rcnn_inception_v2 && mkdir 1 
cp ../faster_rcnn_inception_v2_coco_2018_01_28/frozen_inference_graph.pb ./1/model.graphdef

生成 labels 文件,这里写入的是 coco 数据集标签,链接,生成后确保文件目录如下

3.2 生成 triton 配置文件

创建 config.pbtxt 文件,定义 platform、input、output等内容,参考,其中模型的输入输出名称可通过 netron 等可视化工具查看,参考《22 款神经网络的设计和可视化工具【转载】》

# Copyright (c) 2017, NVIDIA CORPORATION.  All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# tf_gpu_memory_fraction: 0.2 is specified for device with limited memory
# resource such as Nano. Smaller value can limit Tensorflow GPU usage;
# and larger value may increase performance but may also cause Out-Of-Memory
# issues. Please tune a proper value.

name: "faster_rcnn_inception_v2"
platform: "tensorflow_graphdef"
max_batch_size: 8
input [
  {
    name: "image_tensor"
    data_type: TYPE_UINT8
    format: FORMAT_NHWC
    dims: [ 1920, 1080, 3 ]
  }
]
output [
  {
    name: "detection_boxes"
    data_type: TYPE_FP32
    dims: [ 100, 4]
    reshape { shape: [100,4] }
  },
  {
    name: "detection_classes"
    data_type: TYPE_FP32
    dims: [ 100 ]
  },
  {
    name: "detection_scores"
    data_type: TYPE_FP32
    dims: [ 100 ]
  },
  {
    name: "num_detections"
    data_type: TYPE_FP32
    dims: [ 1 ]
    reshape { shape: [] }
  }
]
version_policy: { specific {versions: 1}}
instance_group [
  {
    kind: KIND_GPU
    count: 1
    gpus: [ 0 ]
  }
]
#optimization { execution_accelerators {
#  gpu_execution_accelerator : [ {
#    name : "tensorrt"
#    parameters { key: "precision_mode" value: "FP16" }}]
#}}

创建完后目录结构如下

.
├── 1
│   └── model.graphdef
├── config.pbtxt
└── labels.txt

3.3 生成 deepstream 配置文件

仿照 configs/deepstream-app-trtis 目录下的 config_infer_primary_detector_ssd_mobilenet_v1_coco_2018_01_28.txt 和 source1_primary_detector_nano.txt 文件,生成 faster rcnn 的配置文件

cd ../../configs/deepsteam-app-trtis
cp source1_primary_detector_nano.txt source1_primary_faster_rcnn_inception_v2.txt
cp config_infer_primary_detector_ssd_mobilenet_v1_coco_2018_01_28.txt config_infer_primary_faster_rcnn_inception_v2.txt

然后修改 source1_primary_faster_rcnn_inception_v2.txt 中的 primary-gie 的 config-file 为 config_infer_primary_faster_rcnn_inception_v2.txt

 config_infer_primary_detector_ssd_mobilenet_v1_coco_2018_01_28.txt 按照下边修改

# Copyright (c) 2017, NVIDIA CORPORATION.  All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# tf_gpu_memory_fraction: 0.2 is specified for device with limited memory
# resource such as Nano. Smaller value can limit Tensorflow GPU usage;
# and larger value may increase performance but may also cause Out-Of-Memory
# issues. Please tune a proper value.

infer_config {
  unique_id: 1
  gpu_ids: [0]
  backend {
    trt_is {
      model_name: "faster_rcnn_inception_v2"
      version: -1
      model_repo {
        root: "../../trtis_model_repo"
        log_level: 2
        tf_gpu_memory_fraction: 0
        tf_disable_soft_placement: 0
      }
    }
  }

  preprocess {
    network_format: IMAGE_FORMAT_RGB
    tensor_order: TENSOR_ORDER_NONE
    maintain_aspect_ratio: 0
    frame_scaling_hw: FRAME_SCALING_HW_DEFAULT
    frame_scaling_filter: 1
    normalize {
      scale_factor: 1.0
      channel_offsets: [0, 0, 0]
    }
  }

  postprocess {
    labelfile_path: "../../trtis_model_repo/faster_rcnn_inception_v2/labels.txt"
    detection {
      num_detected_classes: 91
      custom_parse_bbox_func: "NvDsInferParseCustomTfFASTERRCNN"
      nms {
        confidence_threshold: 0.3
        iou_threshold: 0.6
        topk : 100
      }
    }
  }

  extra {
    copy_input_to_host_buffers: false
  }

  custom_lib {
    path: "/opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_infercustomparser.so"
  }
}
input_control {
  process_mode: PROCESS_MODE_FULL_FRAME
  interval: 0
}

output_control {
  detect_control {
    default_filter { bbox_filter { min_width: 32, min_height: 32 } }
  }
}

3.4 后处理

模型推理之后,需要有后处理过程,由于 tensorflow faster rcnn 的数据结构与 tensorflow ssd 的结构相同,因此我们仿照 NvDsInferParseCustomTfSSD 函数来写 NvDsInferParseCustomTfFASTERRCNN 功能

如下,修改 DEEPSTREAM_DIR/sources/libs/nvdsinfer_customparser/nvdsinfer_custombboxparser.cpp 文件,复制 NvDsInferParseCustomTfSSD 函数的声明和定义,修改为 NvDsInferParseCustomTfFASTERRCNN,然后重新 make,生成 libnvds_infercustomparser.so 动态库,拷贝到 $DEEPSTREAM_DIR/lib 下边。

如果是你自己定制化的模型,需要根据你的输出结果来修改后处理功能即可。

3.5 运行 app

cd $DEEPSTREAM_DIR/samples/configs/deepstream-app-trtis 
deepstream-app -c source1_primary_faster_rcnn_inception_v2.txt

经过以上步骤便可以成功的在 deepstream 中运行一个 TensorFlow Model Zoo 中的模型

4 优化模型

为了实现更高的 FPS,TensorFlow 支持将模型通过 TF-TRT 导出为 tensorrt 模型,然后再在 triton 上部署,这块内容我会在下一篇博客中介绍。

你可能感兴趣的:(deepstream)