点击查看系列文章目录
deepstream 官方文档介绍,支持与 Triton Server 对接,把 nvinfer 的部分放到 triton 上来实现,即 nvinferserver,那么这种方式可以解决什么问题呢?
首先,triton server 支持更多种类的模型,如 pytorch、tensorflow、tensorrt 等,而 nvinfer 只专注于做 tensorrt 模型的集成,这对于模型的开发调试有利;但是如果在实际生产环境中,还是建议使用 nvinfer 插件,因为这种方式更高效,对内存、显存等利用率高,速度更快些。总之,deepstream 结合 triton 具备以下特点:
本文对 deepstream 与 triton 的结合部署方法做一个介绍。
官方教程:https://developer.nvidia.com/blog/building-iva-apps-using-deepstream-5-0-updated-for-ga/
github:https://github.com/NVIDIA-AI-IOT/deepstream_triton_model_deploy
在 jetson 上部署的时候,triton 相关的动态库已经作为 deepstream sdk 的一部分安装好了,因此可以直接调用,可以通过 gst-inspect-1.0 工具查看插件相关信息
$ gst-inspect-1.0 nvinferserver
2021-01-04 15:43:06.093014: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
Factory Details:
Rank primary (256)
Long-name NvInferServer plugin
Klass NvInferServer Plugin
Description Nvidia DeepStreamSDK TensorRT plugin
Author NVIDIA Corporation. Deepstream for Tesla forum: https://devtalk.nvidia.com/default/board/209
Plugin Details:
Name nvdsgst_inferserver
Description NVIDIA DeepStreamSDK TensorRT Inference Server plugin
Filename /usr/lib/aarch64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_inferserver.so
Version 5.0.0
License Proprietary
Source module nvinferserver
Binary package NVIDIA DeepStreamSDK TensorRT Inference Server plugin
Origin URL http://nvidia.com/
首先准备模型
$ cd $DEEPSTREAM_DIR/deepstream-5.0/samples
$ ./prepare_ds_trtis_model_repo.sh
会在 samples 路径下生成 trtis_model_repo 文件夹,在里边下载 triton 调用的模型,我们以目标检测 ssd_mobilenet_v1 为例介绍,文件目录结构如下
└── ssd_mobilenet_v1_coco_2018_01_28
├── 1
│ └── frozen_inference_graph.pb
├── config.pbtxt
└── labels.txt
其中,config.pbtxt 定义了模型相关的信息,其含义及设置方法参考链接,labels.txt 是类别信息,1 目录下的 frozen_inference_graph.pb 表示版本号为 1 的 tensorflow pb 模型
然后,我们就可以运行 configs/deepstream-app-trtis 目录下的配置文件,我们需要用 deepstream-app 来运行一个 app 的配置文件 source1_primary_detector_nano.txt,定义了 pipeline 的各个插件,包括 source、sink、osd、streammux 等,这些和正常的 pipeline 没有区别,主要是 primary-gie 的配置如下
[primary-gie]
enable=1
#(0): nvinfer; (1): nvinferserver
plugin-type=1
#infer-raw-output-dir=trtis-output
batch-size=1
interval=0
gie-unique-id=1
config-file=config_infer_primary_detector_ssd_mobilenet_v1_coco_2018_01_28.txt
要指定使用 nvinferserver 插件来推理,同时指定配置文件 config-file=config_infer_primary_detector_ssd_mobilenet_v1_coco_2018_01_28.txt,在这个配置文件中,要指定前处理、后处理、模型信息等内容,其中后处理调用的是 /opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_infercustomparser.so 动态库,函数入口是 NvDsInferParseCustomTfSSD,你也可以修改 $DEEPSTREAM_DIR/sources/libs/nvdsinfer_customparser/ 下边的源码,来自定义自己模型的后处理过程。
两个配置文件设好之后,使用 deepstream-app 来调用,输出如下
$ deepstream-app -c source1_primary_detector_nano.txt
(deepstream-app:2197): GLib-GObject-WARNING **: 16:03:14.898: g_object_set_is_valid_property: object class 'GstNvStreamMux' has no property named 'attach-sys-ts'
2021-01-04 16:03:15.510845: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
0:00:00.895476736 2197 0x7f2c002360 WARN nvinferserver gstnvinferserver_impl.cpp:248:validatePluginConfig: warning: Configuration file unique-id reset to: 1
I0104 08:03:15.695353 2197 server.cc:120] Initializing Triton Inference Server
W0104 08:03:15.700003 2197 autofill.cc:245] Autofiller failed to retrieve model. Error Details: Internal: unable to autofill for 'ssd_mobilenet_v1_coco_2018_01_28', unable to find graphdef file named 'model.graphdef'
W0104 08:03:15.700073 2197 autofill.cc:251] Proceeding with simple config for now
I0104 08:03:15.700221 2197 server_status.cc:55] New status tracking for model 'ssd_mobilenet_v1_coco_2018_01_28'
I0104 08:03:15.700324 2197 model_repository_manager.cc:680] loading: ssd_mobilenet_v1_coco_2018_01_28:1
I0104 08:03:15.700867 2197 base_backend.cc:176] Creating instance ssd_mobilenet_v1_coco_2018_01_28_0_0_gpu0 on GPU 0 (7.2) using frozen_inference_graph.pb
2021-01-04 16:03:15.751817: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2021-01-04 16:03:15.752817: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x98da450 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-01-04 16:03:15.752894: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-01-04 16:03:15.753097: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2021-01-04 16:03:15.753356: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2021-01-04 16:03:15.753511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.377
pciBusID: 0000:00:00.0
2021-01-04 16:03:15.753558: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2021-01-04 16:03:15.753632: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-01-04 16:03:15.756360: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-01-04 16:03:15.757063: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2021-01-04 16:03:15.760397: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2021-01-04 16:03:15.762845: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2021-01-04 16:03:15.762984: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2021-01-04 16:03:15.763126: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2021-01-04 16:03:15.763305: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2021-01-04 16:03:15.763404: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2021-01-04 16:03:17.807271: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-04 16:03:17.807389: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186] 0
2021-01-04 16:03:17.807448: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0: N
2021-01-04 16:03:17.807765: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2021-01-04 16:03:17.808003: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2021-01-04 16:03:17.808197: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2021-01-04 16:03:17.808350: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3164 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
2021-01-04 16:03:17.812900: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7ec4d39930 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-01-04 16:03:17.812991: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Xavier, Compute Capability 7.2
I0104 08:03:18.119443 2197 model_repository_manager.cc:837] successfully loaded 'ssd_mobilenet_v1_coco_2018_01_28' version 1
INFO: TrtISBackend id:1 initialized model: ssd_mobilenet_v1_coco_2018_01_28
2021-01-04 16:03:23.412157: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2021-01-04 16:03:25.404889: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
Runtime commands:
h: Print this help
q: Quit
p: Pause
r: Resume
NOTE: To expand a source in the 2D tiled display and view object details, left-click on the source.
To go back to the tiled display, right-click anywhere on the window.
**PERF: FPS 0 (Avg)
**PERF: 0.00 (0.00)
** INFO: : Pipeline ready
WARNING from primary_gie: Configuration file unique-id reset to: 1
Debug info: /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinferserver/gstnvinferserver_impl.cpp(248): validatePluginConfig (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInferServer:primary_gie
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
** INFO: : Pipeline running
**PERF: 10.65 (10.47)
**PERF: 10.64 (10.64)
x86 环境下需要通过 docker 来运行,在 NGC 上下载 nvcr.io/nvidia/deepstream:5.0-20.04-triton,后续的操作类似
上边的方法是我们调用官方提供的模型进行测试,如果我们使用 TensorFlow 开发的话,也可以自己将 model zoo 中的模型,经过自己训练,参考《TensorFlow之目标检测API接口调试(超详细)》,然后部署到 deepstream 中。
官方教程:https://forums.developer.nvidia.com/t/deploying-models-from-tensorflow-model-zoo-using-nvidia-deepstream-and-nvidia-triton-inference-server/155682
我们以 model zoo 中的 faster_rcnn_inception_v2_coco_2018_01_28 model 为例进行介绍,需要经过 5 个步骤
cd $DEEPSTREAM_DIR/samples/trtis_model_repo
wget http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_v2_coco_2018_01_28.tar.gz
tar xvf faster_rcnn_inception_v2_coco_2018_01_28.tar.gz
mkdir faster_rcnn_inception_v2 && cd faster_rcnn_inception_v2 && mkdir 1
cp ../faster_rcnn_inception_v2_coco_2018_01_28/frozen_inference_graph.pb ./1/model.graphdef
生成 labels 文件,这里写入的是 coco 数据集标签,链接,生成后确保文件目录如下
创建 config.pbtxt 文件,定义 platform、input、output等内容,参考,其中模型的输入输出名称可通过 netron 等可视化工具查看,参考《22 款神经网络的设计和可视化工具【转载】》
# Copyright (c) 2017, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# tf_gpu_memory_fraction: 0.2 is specified for device with limited memory
# resource such as Nano. Smaller value can limit Tensorflow GPU usage;
# and larger value may increase performance but may also cause Out-Of-Memory
# issues. Please tune a proper value.
name: "faster_rcnn_inception_v2"
platform: "tensorflow_graphdef"
max_batch_size: 8
input [
{
name: "image_tensor"
data_type: TYPE_UINT8
format: FORMAT_NHWC
dims: [ 1920, 1080, 3 ]
}
]
output [
{
name: "detection_boxes"
data_type: TYPE_FP32
dims: [ 100, 4]
reshape { shape: [100,4] }
},
{
name: "detection_classes"
data_type: TYPE_FP32
dims: [ 100 ]
},
{
name: "detection_scores"
data_type: TYPE_FP32
dims: [ 100 ]
},
{
name: "num_detections"
data_type: TYPE_FP32
dims: [ 1 ]
reshape { shape: [] }
}
]
version_policy: { specific {versions: 1}}
instance_group [
{
kind: KIND_GPU
count: 1
gpus: [ 0 ]
}
]
#optimization { execution_accelerators {
# gpu_execution_accelerator : [ {
# name : "tensorrt"
# parameters { key: "precision_mode" value: "FP16" }}]
#}}
创建完后目录结构如下
.
├── 1
│ └── model.graphdef
├── config.pbtxt
└── labels.txt
仿照 configs/deepstream-app-trtis 目录下的 config_infer_primary_detector_ssd_mobilenet_v1_coco_2018_01_28.txt 和 source1_primary_detector_nano.txt 文件,生成 faster rcnn 的配置文件
cd ../../configs/deepsteam-app-trtis
cp source1_primary_detector_nano.txt source1_primary_faster_rcnn_inception_v2.txt
cp config_infer_primary_detector_ssd_mobilenet_v1_coco_2018_01_28.txt config_infer_primary_faster_rcnn_inception_v2.txt
然后修改 source1_primary_faster_rcnn_inception_v2.txt 中的 primary-gie 的 config-file 为 config_infer_primary_faster_rcnn_inception_v2.txt
config_infer_primary_detector_ssd_mobilenet_v1_coco_2018_01_28.txt 按照下边修改
# Copyright (c) 2017, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# tf_gpu_memory_fraction: 0.2 is specified for device with limited memory
# resource such as Nano. Smaller value can limit Tensorflow GPU usage;
# and larger value may increase performance but may also cause Out-Of-Memory
# issues. Please tune a proper value.
infer_config {
unique_id: 1
gpu_ids: [0]
backend {
trt_is {
model_name: "faster_rcnn_inception_v2"
version: -1
model_repo {
root: "../../trtis_model_repo"
log_level: 2
tf_gpu_memory_fraction: 0
tf_disable_soft_placement: 0
}
}
}
preprocess {
network_format: IMAGE_FORMAT_RGB
tensor_order: TENSOR_ORDER_NONE
maintain_aspect_ratio: 0
frame_scaling_hw: FRAME_SCALING_HW_DEFAULT
frame_scaling_filter: 1
normalize {
scale_factor: 1.0
channel_offsets: [0, 0, 0]
}
}
postprocess {
labelfile_path: "../../trtis_model_repo/faster_rcnn_inception_v2/labels.txt"
detection {
num_detected_classes: 91
custom_parse_bbox_func: "NvDsInferParseCustomTfFASTERRCNN"
nms {
confidence_threshold: 0.3
iou_threshold: 0.6
topk : 100
}
}
}
extra {
copy_input_to_host_buffers: false
}
custom_lib {
path: "/opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_infercustomparser.so"
}
}
input_control {
process_mode: PROCESS_MODE_FULL_FRAME
interval: 0
}
output_control {
detect_control {
default_filter { bbox_filter { min_width: 32, min_height: 32 } }
}
}
模型推理之后,需要有后处理过程,由于 tensorflow faster rcnn 的数据结构与 tensorflow ssd 的结构相同,因此我们仿照 NvDsInferParseCustomTfSSD 函数来写 NvDsInferParseCustomTfFASTERRCNN 功能
如下,修改 DEEPSTREAM_DIR/sources/libs/nvdsinfer_customparser/nvdsinfer_custombboxparser.cpp 文件,复制 NvDsInferParseCustomTfSSD 函数的声明和定义,修改为 NvDsInferParseCustomTfFASTERRCNN,然后重新 make,生成 libnvds_infercustomparser.so 动态库,拷贝到 $DEEPSTREAM_DIR/lib 下边。
如果是你自己定制化的模型,需要根据你的输出结果来修改后处理功能即可。
cd $DEEPSTREAM_DIR/samples/configs/deepstream-app-trtis
deepstream-app -c source1_primary_faster_rcnn_inception_v2.txt
经过以上步骤便可以成功的在 deepstream 中运行一个 TensorFlow Model Zoo 中的模型
为了实现更高的 FPS,TensorFlow 支持将模型通过 TF-TRT 导出为 tensorrt 模型,然后再在 triton 上部署,这块内容我会在下一篇博客中介绍。