https://github.com/triton-inference-server/server/tree/r21.09
docker pull nvcr.io/nvidia/tritonserver:21.09-py3
Model Repository 是用于存放模型及配置文件的文件夹,其格式如下:
重点:Model Repository下的每个model-name子文件分别代表一个服务,在url服务请求时,当前文件名为其路由名
https://github.com/triton-inference-server/server/blob/r21.09/docs/model_repository.md
对config.pbtxt的写入。必须配置项:platform(或者backend properties)、max_batch_size、input、output。可通过dynamic_batching配置实现动态batch。
如果采用非动态batch,则,max_batch_size:0
name: "myshapetensormodel"
platform: "tensorrt_plan"
max_batch_size: 8
input [
{
name: "input0"
data_type: TYPE_FP32
dims: [ -1 ]
},
{
name: "input1"
data_type: TYPE_INT32
dims: [ 1 ]
is_shape_tensor: true
}
]
output [
{
name: "output0"
data_type: TYPE_FP32
dims: [ -1 ]
}
]
dynamic_batching {
preferred_batch_size: [ 4, 8 ]
max_queue_delay_microseconds: 100
}
"input0": [ x, -1]
"input1": [ 1 ]
"output0": [ x, -1]
dynamic_axes={'input_ids': {0: 'batch_size'},
'attention_mask': {0: 'batch_size'},
'output': {0: 'batch_size'}}
instance_group [
{
count: 1
kind: KIND_GPU
gpus: [ 0 ] # optional
},
{
count: 2
kind: KIND_GPU
gpus: [ 1, 2 ] # optional
}
]
instance_group [
{
count: 2
kind: KIND_CPU
}
]
An ensemble model represents a pipeline of one or more models and the connection of input and output tensors between those models. Ensemble models are intended to be used to encapsulate a procedure(封装一个流程) that involves multiple models, such as “data preprocessing -> inference -> data postprocessing”
具体详情可见:https://github.com/triton-inference-server/server/blob/r21.09/docs/architecture.md#ensemble-models
重点:配置input和output时,需要配置name, input和output的name可以在导出为onnx格式文件时加上。具体详情:https://pytorch.org/docs/stable/onnx.html
详细配置格式:https://github.com/triton-inference-server/server/blob/r21.09/docs/model_configuration.md
docker run --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/home/nlu/server-r21.09/docs/examples/model_repository/text_encoder:/models nvcr.io/nvidia/tritonserver:21.09-py3 tritonserver --model-repository=/models
注释: