英伟达triton---简介与安装

1,简介

NVIDIA Triton推理服务器(NVIDIA Triton Inference Server),此前称为TensorRT推理服务器(TensorRT Inference Server),现可通过NVIDIA NGC或GitHub访问。
NVIDIA Triton推理服务器能够帮助开发人员和IT/DevOps轻松地在云端、本地数据中心或边缘部署高性能推理服务器。该服务器通过HTTP/REST或GRPC端点提供推理服务,允许客户端请求对服务器管理的任何模型进行推理。
开发人员和AI公司可以使用NVIDIA Triton推理服务器部署不同框架后端(如TensorFlow、TensorRT、PyTorch和ONNX Runtime)的模型。

2,预建的Docker容器

参考1:TLT(transfer learning toolkit)安装教程:

Software Requirements

  • Ubuntu 18.04 LTS
  • NVIDIA GPU Cloud account and API key - https://ngc.nvidia.com/
  • docker-ce installed, https://docs.docker.com/install/linux/docker-ce/ubuntu/
  • Nvidia docker installed, instructions at https://github.com/NVIDIA/nvidia-docker
  • NVIDIA GPU driver v410.xx or above
Note: DeepStream 5.0 - NVIDIA SDK for IVA inference https://developer.nvidia.com/deepstream-sdk is recommended.

Installation Prerequisites

  • Install Docker. See: https://www.docker.com/.
  • NVIDIA GPU driver v410.xx or above. Download from https://www.nvidia.com/Download/index.aspx?lang=en-us.
  • Install NVIDIA Docker from: https://github.com/NVIDIA/nvidia-docker.

Get an NGC API key

  • NVIDIA GPU Cloud account and API key - https://ngc.nvidia.com/
    1. Go to NGC and click the Transfer Learning Toolkit container in the Catalog tab. This message is displayed, Sign in to access the PULL feature of this repository.
    2. Enter your email address and click Next or click Create an Account.
    3. Choose your organization when prompted for Organization/Team.
    4. Click Sign In.
    5. Select the Containers tab on the left navigation pane and click the Transfer Learning Toolkit tile.

Download the docker container

  • Execute docker login nvcr.io from the command line and enter these login credentials:
    • Username: $oauthtoken
    • Password: YOUR_NGC_API_KEY
  • Execute docker pull nvcr.io/nvidia/tlt-streamanalytics:

参考2:ubuntu18.04安装docker-ce备忘

3,triton服务镜像安装

参考:快速开始
执行先决条件(下载的模型以后验证用):

# 克隆源码
git clone https://github.com/triton-inference-server/server
# checkout版本
git checkout r20.08
# 下载模型
cd docs/examples
./fetch_models.sh

拉取Triton container命令:

docker pull nvcr.io/nvidia/tritonserver:20.08-py3

4,triton客户镜像安装

登录:https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver
获取并执行最新Pull Command:

docker pull nvcr.io/nvidia/tritonserver:20.08-py3-clientsdk

5,例子验证

挂载外部文件+指定模型目录+运行triton服务
挂载外部文件(先前下载模型的目录):/media/zhou/data/NVIDIA/022-triton/server/docs/examples/model_repository
指定模型目录:–model-repository=/models
一个GPU:–gpus=1

root@nvidia:~# docker run --gpus=1 --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v /media/zhou/data/NVIDIA/022-triton/server/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:20.08-py3 tritonserver --model-repository=/models

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 20.08 (build 15533555)

Copyright (c) 2018-2020, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

I0902 08:17:32.409691 1 metrics.cc:184] found 1 GPUs supporting NVML metrics
I0902 08:17:32.415133 1 metrics.cc:193]   GPU 0: GeForce GTX 1660
I0902 08:17:32.415289 1 server.cc:119] Initializing Triton Inference Server
I0902 08:17:32.535673 1 model_repository_manager.cc:737] loading: densenet_onnx:1
I0902 08:17:32.535973 1 model_repository_manager.cc:737] loading: simple:1
I0902 08:17:32.536104 1 model_repository_manager.cc:737] loading: resnet50_netdef:1
I0902 08:17:32.536233 1 model_repository_manager.cc:737] loading: simple_string:1
I0902 08:17:32.536419 1 model_repository_manager.cc:737] loading: inception_graphdef:1
I0902 08:17:32.549990 1 onnx_backend.cc:198] Creating instance densenet_onnx_0_gpu0 on GPU 0 (7.5) using model.onnx
WARNING: Since openmp is enabled in this build, this API cannot be used to configure intra op num threads. Please use the openmp environment variables to control the number of threads.
I0902 08:17:32.629375 1 netdef_backend.cc:201] Creating instance resnet50_netdef_0_gpu0 on GPU 0 (7.5) using init_model.netdef and model.netdef
[E init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
2020-09-02 08:17:32.699612: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
I0902 08:17:32.726612 1 tensorflow.cc:2075] TRITONBACKEND_Initialize: tensorflow
I0902 08:17:32.726632 1 tensorflow.cc:2088] Triton TRITONBACKEND API version: 0.2
I0902 08:17:32.726635 1 tensorflow.cc:2094] 'tensorflow' TRITONBACKEND API version: 0.2
I0902 08:17:32.726638 1 tensorflow.cc:2115] backend configuration:
{}
I0902 08:17:32.727943 1 tensorflow.cc:2173] TRITONBACKEND_ModelInitialize: simple (version 1)
I0902 08:17:32.727957 1 tensorflow.cc:2173] TRITONBACKEND_ModelInitialize: simple_string (version 1)
I0902 08:17:32.728247 1 tensorflow.cc:2173] TRITONBACKEND_ModelInitialize: inception_graphdef (version 1)
I0902 08:17:32.728692 1 tensorflow.cc:2220] TRITONBACKEND_ModelInstanceInitialize: simple (device 0)
I0902 08:17:32.728699 1 tensorflow.cc:2220] TRITONBACKEND_ModelInstanceInitialize: inception_graphdef (device 0)
I0902 08:17:32.728711 1 tensorflow.cc:2220] TRITONBACKEND_ModelInstanceInitialize: simple_string (device 0)
2020-09-02 08:17:33.844618: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz
2020-09-02 08:17:33.845071: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f775c073b10 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-09-02 08:17:33.845084: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-09-02 08:17:33.845220: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-09-02 08:17:33.845298: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.845519: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1634] Found device 0 with properties: 
name: GeForce GTX 1660 major: 7 minor: 5 memoryClockRate(GHz): 1.785
pciBusID: 0000:01:00.0
2020-09-02 08:17:33.845531: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2020-09-02 08:17:33.845546: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2020-09-02 08:17:33.845558: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-09-02 08:17:33.845569: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-09-02 08:17:33.847635: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-09-02 08:17:33.847691: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2020-09-02 08:17:33.847705: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-09-02 08:17:33.847778: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.848045: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.848243: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1762] Adding visible gpu devices: 0
2020-09-02 08:17:33.848347: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1175] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-02 08:17:33.848355: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181]      0 
2020-09-02 08:17:33.848358: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1194] 0:   N 
2020-09-02 08:17:33.848416: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.848598: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.848772: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.848952: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1320] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4529 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-09-02 08:17:33.851272: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f775c72a5a0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-09-02 08:17:33.851345: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1660, Compute Capability 7.5
2020-09-02 08:17:33.851573: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.851865: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1634] Found device 0 with properties: 
name: GeForce GTX 1660 major: 7 minor: 5 memoryClockRate(GHz): 1.785
pciBusID: 0000:01:00.0
2020-09-02 08:17:33.851919: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2020-09-02 08:17:33.851926: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2020-09-02 08:17:33.851933: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-09-02 08:17:33.851937: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-09-02 08:17:33.851960: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-09-02 08:17:33.851966: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2020-09-02 08:17:33.851972: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-09-02 08:17:33.852019: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.852215: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.852377: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1762] Adding visible gpu devices: 0
2020-09-02 08:17:33.852400: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1175] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-02 08:17:33.852415: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181]      0 
2020-09-02 08:17:33.852421: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1194] 0:   N 
2020-09-02 08:17:33.852727: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.852953: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.853145: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1320] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4529 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-09-02 08:17:33.854507: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.854801: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1634] Found device 0 with properties: 
name: GeForce GTX 1660 major: 7 minor: 5 memoryClockRate(GHz): 1.785
pciBusID: 0000:01:00.0
2020-09-02 08:17:33.854847: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2020-09-02 08:17:33.854855: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2020-09-02 08:17:33.854864: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-09-02 08:17:33.854870: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-09-02 08:17:33.854889: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-09-02 08:17:33.854897: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2020-09-02 08:17:33.854903: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-09-02 08:17:33.854945: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.855146: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.855314: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1762] Adding visible gpu devices: 0
2020-09-02 08:17:33.855347: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1175] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-02 08:17:33.855356: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181]      0 
2020-09-02 08:17:33.855360: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1194] 0:   N 
2020-09-02 08:17:33.855418: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.855619: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:985] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-02 08:17:33.855809: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1320] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4529 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660, pci bus id: 0000:01:00.0, compute capability: 7.5)
I0902 08:17:33.856482 1 model_repository_manager.cc:925] successfully loaded 'simple_string' version 1
I0902 08:17:33.862965 1 model_repository_manager.cc:925] successfully loaded 'simple' version 1
I0902 08:17:33.912278 1 model_repository_manager.cc:925] successfully loaded 'inception_graphdef' version 1
I0902 08:17:34.585287 1 model_repository_manager.cc:925] successfully loaded 'densenet_onnx' version 1
I0902 08:17:34.589316 1 model_repository_manager.cc:925] successfully loaded 'resnet50_netdef' version 1
I0902 08:17:34.590043 1 grpc_server.cc:3897] Started GRPCInferenceService at 0.0.0.0:8001
I0902 08:17:34.590212 1 http_server.cc:2679] Started HTTPService at 0.0.0.0:8000
I0902 08:17:34.631697 1 http_server.cc:2698] Started Metrics Service at 0.0.0.0:8002

验证triton服务是否正常启动:
HTTP返回200代表triton正常,否则未正常启动(The HTTP request returns status 200 if Triton is ready and non-200 if it is not ready. )。

root@nvidia:~# curl -v localhost:8000/v2/health/ready
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8000 (#0)
> GET /v2/health/ready HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.58.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Length: 0
< Content-Type: text/plain
< 
* Connection #0 to host localhost left intact

启动triton客户端(挂载外部文件 /media/zhou/data/NVIDIA/022-triton/client):

root@nvidia:~# docker run -it --rm --net=host -v /media/zhou/data/NVIDIA/022-triton/client:/workspace/client nvcr.io/nvidia/tritonserver:20.08-py3-clientsdk
root@nvidia:/workspace# ls
VERSION  build  builddir  client  images  install  src  v2.2.0.clients.tar.gz
root@nvidia:/workspace# cd client/
root@nvidia:/workspace/client# ls
mug.jpg
root@nvidia:/workspace/client# 

执行分类例子:

root@nvidia:/workspace/client# image_client -m resnet50_netdef -s INCEPTION /workspace/client/mug.jpg
Request 0, batch size 1
Image '/workspace/client/mug.jpg':
    0.723991 (504) = COFFEE MUG
root@nvidia:/workspace/client# 

以上是熟悉triton过程中的记录,仅作参考,谢谢!

你可能感兴趣的:(英伟达triton---简介与安装)