NVIDIA Triton推理服务器(NVIDIA Triton Inference Server),此前称为TensorRT推理服务器(TensorRT Inference Server),现可通过NVIDIA NGC或GitHub访问。
NVIDIA Triton推理服务器能够帮助开发人员和IT/DevOps轻松地在云端、本地数据中心或边缘部署高性能推理服务器。该服务器通过HTTP/REST或GRPC端点提供推理服务,允许客户端请求对服务器管理的任何模型进行推理。
开发人员和AI公司可以使用NVIDIA Triton推理服务器部署不同框架后端(如TensorFlow、TensorRT、PyTorch和ONNX Runtime)的模型。
参考1:TLT(transfer learning toolkit)安装教程:
Get an NGC API key
Download the docker container
参考2:ubuntu18.04安装docker-ce备忘
参考:快速开始
执行先决条件(下载的模型以后验证用):
# 克隆源码
git clone https://github.com/triton-inference-server/server
# checkout版本
git checkout r20.08
# 下载模型
cd docs/examples
./fetch_models.sh
拉取Triton container命令:
docker pull nvcr.io/nvidia/tritonserver:20.08-py3
登录:https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver
获取并执行最新Pull Command:
docker pull nvcr.io/nvidia/tritonserver:20.08-py3-clientsdk
挂载外部文件+指定模型目录+运行triton服务
挂载外部文件(先前下载模型的目录):/media/zhou/data/NVIDIA/022-triton/server/docs/examples/model_repository
指定模型目录:–model-repository=/models
一个GPU:–gpus=1
root@nvidia:~# docker run --gpus=1 --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v /media/zhou/data/NVIDIA/022-triton/server/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:20.08-py3 tritonserver --model-repository=/models
=============================
== Triton Inference Server ==
=============================
NVIDIA Release 20.08 (build 15533555)
Copyright (c) 2018-2020, NVIDIA CORPORATION. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.
I0902 08:17:32.409691 1 metrics.cc:184] found 1 GPUs supporting NVML metrics
I0902 08:17:32.415133 1 metrics.cc:193] GPU 0: GeForce GTX 1660
I0902 08:17:32.415289 1 server.cc:119] Initializing Triton Inference Server
I0902 08:17:32.535673 1 model_repository_manager.cc:737] loading: densenet_onnx:1
I0902 08:17:32.535973 1 model_repository_manager.cc:737] loading: simple:1
I0902 08:17:32.536104 1 model_repository_manager.cc:737] loading: resnet50_netdef:1
I0902 08:17:32.536233 1 model_repository_manager.cc:737] loading: simple_string:1
I0902 08:17:32.536419 1 model_repository_manager.cc:737] loading: inception_graphdef:1
I0902 08:17:32.549990 1 onnx_backend.cc:198] Creating instance densenet_onnx_0_gpu0 on GPU 0 (7.5) using model.onnx
WARNING: Since openmp is enabled in this build, this API cannot be used to configure intra op num threads. Please use the openmp environment variables to control the number of threads.
I0902 08:17:32.629375 1 netdef_backend.cc:201] Creating instance resnet50_netdef_0_gpu0 on GPU 0 (7.5) using init_model.netdef and model.netdef
[E init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
2020-09-02 08:17:32.699612: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
I0902 08:17:32.726612 1 tensorflow.cc:2075] TRITONBACKEND_Initialize: tensorflow
I0902 08:17:32.726632 1 tensorflow.cc:2088] Triton TRITONBACKEND API version: 0.2
I0902 08:17:32.726635 1 tensorflow.cc:2094] 'tensorflow' TRITONBACKEND API version: 0.2
I0902 08:17:32.726638 1 tensorflow.cc:2115] backend configuration:
{}
I0902 08:17:32.727943 1 tensorflow.cc:2173] TRITONBACKEND_ModelInitialize: simple (version 1)
I0902 08:17:32.727957 1 tensorflow.cc:2173] TRITONBACKEND_ModelInitialize: simple_string (version 1)
I0902 08:17:32.728247 1 tensorflow.cc:2173] TRITONBACKEND_ModelInitialize: inception_graphdef (version 1)
I0902 08:17:32.728692 1 tensorflow.cc:2220] TRITONBACKEND_ModelInstanceInitialize: simple (device 0)
I0902 08:17:32.728699 1 tensorflow.cc:2220] TRITONBACKEND_ModelInstanceInitialize: inception_graphdef (device 0)
I0902 08:17:32.728711 1 tensorflow.cc:2220] TRITONBACKEND_ModelInstanceInitialize: simple_string (device 0)
2020-09-02 08:17:33.844618: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz
2020-09-02 08:17:33.845071: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f775c073b10 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-09-02 08:17:33.845084: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-09-02 08:17:33.845220: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-09-02 08:17:33.845298: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.845519: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1634] Found device 0 with properties:
name: GeForce GTX 1660 major: 7 minor: 5 memoryClockRate(GHz): 1.785
pciBusID: 0000:01:00.0
2020-09-02 08:17:33.845531: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2020-09-02 08:17:33.845546: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2020-09-02 08:17:33.845558: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-09-02 08:17:33.845569: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-09-02 08:17:33.847635: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-09-02 08:17:33.847691: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2020-09-02 08:17:33.847705: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-09-02 08:17:33.847778: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.848045: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.848243: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1762] Adding visible gpu devices: 0
2020-09-02 08:17:33.848347: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1175] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-02 08:17:33.848355: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] 0
2020-09-02 08:17:33.848358: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1194] 0: N
2020-09-02 08:17:33.848416: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.848598: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.848772: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.848952: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1320] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4529 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-09-02 08:17:33.851272: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f775c72a5a0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-09-02 08:17:33.851345: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GTX 1660, Compute Capability 7.5
2020-09-02 08:17:33.851573: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.851865: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1634] Found device 0 with properties:
name: GeForce GTX 1660 major: 7 minor: 5 memoryClockRate(GHz): 1.785
pciBusID: 0000:01:00.0
2020-09-02 08:17:33.851919: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2020-09-02 08:17:33.851926: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2020-09-02 08:17:33.851933: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-09-02 08:17:33.851937: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-09-02 08:17:33.851960: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-09-02 08:17:33.851966: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2020-09-02 08:17:33.851972: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-09-02 08:17:33.852019: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.852215: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.852377: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1762] Adding visible gpu devices: 0
2020-09-02 08:17:33.852400: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1175] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-02 08:17:33.852415: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] 0
2020-09-02 08:17:33.852421: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1194] 0: N
2020-09-02 08:17:33.852727: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.852953: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.853145: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1320] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4529 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-09-02 08:17:33.854507: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.854801: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1634] Found device 0 with properties:
name: GeForce GTX 1660 major: 7 minor: 5 memoryClockRate(GHz): 1.785
pciBusID: 0000:01:00.0
2020-09-02 08:17:33.854847: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2020-09-02 08:17:33.854855: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2020-09-02 08:17:33.854864: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-09-02 08:17:33.854870: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-09-02 08:17:33.854889: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-09-02 08:17:33.854897: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2020-09-02 08:17:33.854903: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-09-02 08:17:33.854945: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.855146: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.855314: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1762] Adding visible gpu devices: 0
2020-09-02 08:17:33.855347: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1175] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-02 08:17:33.855356: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] 0
2020-09-02 08:17:33.855360: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1194] 0: N
2020-09-02 08:17:33.855418: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.855619: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.855809: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1320] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4529 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660, pci bus id: 0000:01:00.0, compute capability: 7.5)
I0902 08:17:33.856482 1 model_repository_manager.cc:925] successfully loaded 'simple_string' version 1
I0902 08:17:33.862965 1 model_repository_manager.cc:925] successfully loaded 'simple' version 1
I0902 08:17:33.912278 1 model_repository_manager.cc:925] successfully loaded 'inception_graphdef' version 1
I0902 08:17:34.585287 1 model_repository_manager.cc:925] successfully loaded 'densenet_onnx' version 1
I0902 08:17:34.589316 1 model_repository_manager.cc:925] successfully loaded 'resnet50_netdef' version 1
I0902 08:17:34.590043 1 grpc_server.cc:3897] Started GRPCInferenceService at 0.0.0.0:8001
I0902 08:17:34.590212 1 http_server.cc:2679] Started HTTPService at 0.0.0.0:8000
I0902 08:17:34.631697 1 http_server.cc:2698] Started Metrics Service at 0.0.0.0:8002
验证triton服务是否正常启动:
HTTP返回200代表triton正常,否则未正常启动(The HTTP request returns status 200 if Triton is ready and non-200 if it is not ready. )。
root@nvidia:~# curl -v localhost:8000/v2/health/ready
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8000 (#0)
> GET /v2/health/ready HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Length: 0
< Content-Type: text/plain
<
* Connection #0 to host localhost left intact
启动triton客户端(挂载外部文件 /media/zhou/data/NVIDIA/022-triton/client):
root@nvidia:~# docker run -it --rm --net=host -v /media/zhou/data/NVIDIA/022-triton/client:/workspace/client nvcr.io/nvidia/tritonserver:20.08-py3-clientsdk
root@nvidia:/workspace# ls
VERSION build builddir client images install src v2.2.0.clients.tar.gz
root@nvidia:/workspace# cd client/
root@nvidia:/workspace/client# ls
mug.jpg
root@nvidia:/workspace/client#
执行分类例子:
root@nvidia:/workspace/client# image_client -m resnet50_netdef -s INCEPTION /workspace/client/mug.jpg
Request 0, batch size 1
Image '/workspace/client/mug.jpg':
0.723991 (504) = COFFEE MUG
root@nvidia:/workspace/client#
以上是熟悉triton过程中的记录,仅作参考,谢谢!