目录
1.背景
2.torchserve环境搭建
2.1jdk环境搭建
2.2 python 环境搭建
2.3 启动服务
2.3.1 注册模型
2.3.2 模型查看
2.3.3 接口调用
3 进阶功能
3.1 模型多版本管理
3.2 新模型注册
由于技术路线调整,需求调整原本的模型推理服务——tensorflow-serving,经过初步调研,可替换的服务框架有:torchserve和triton。本文只设计torchserve的环境部署方式和初级功能使用介绍。
基本运行环境
torch 1.12.1
torch-model-archiver 0.6.0
torch-workflow-archiver 0.2.4
torchserve 0.6.0
torchvision 0.13.1
jdk>=11
下载jdk
网上参考较多不在赘述
解压配置环境变量
export JAVA_HOME=/usr/local/jdk11.0.10
export PATH=${PATH}:${JAVA_HOME}/bin
export CLASSPATH=.:$JAVA_HOME/jre/lib/ext:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
验证环境
$ java -version
openjdk version "11.0.10" 2021-01-19
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.10+9)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.10+9, mixed mode)
官方文档也很详细,可以参考
serve/README.md at master · pytorch/serve · GitHub
自己配置的话
conda create -n ts python=3.9
pip install torchserve torch-model-archiver torch-workflow-archiver
模型推理以来的pip包可以后续安装
安装完成后,可以先启动服务看一下效果
torchserve --start --model-store model_save
由于我们未配置模型信息,所以调用模型查看接口,返回结果中没有模型信息
$ curl http://127.0.0.1:8081/models
{
"models": []
}
注册模型需要先对模型进行打包
打包模型文件和权重为一个文件
wget https://download.pytorch.org/models/resnet18-f37072fd.pth
torch-model-archiver --model-name resnet-18 --version 1.0 --model-file ./examples/image_classifier/resnet_18/model.py --serialized-file resnet18-f37072fd.pth --handler image_classifier --extra-files ./examples/image_classifier/index_to_name.json
mkdir model_store
mv resnet-18.mar model_store/
torchserve --start --model-store model_store --models resnet-18=resnet-18.mar
$ curl http://127.0.0.1:8081/models
{
"models": [
{
"modelName": "resnet-18",
"modelUrl": "resnet-18.mar"
}
]
}
$ curl http://127.0.0.1:8080/predictions/resnet-18 -T ./examples/image_classifier/kitten.jpg
{
"tabby": 0.40966343879699707,
"tiger_cat": 0.3467043936252594,
"Egyptian_cat": 0.13002890348434448,
"lynx": 0.023919543251395226,
"bucket": 0.011532200500369072
}
基本功能验证完成。
如果同一个模型存在多个注册版本,访问时可以在url中添加version参数进行区分:
/predictions/{model_name}/{version}
:
比如2.3.1注册的resnet18 version=1.0的模型,我也可通过url中增加版本号进行访问:
curl http://127.0.0.1:8080/predictions/resnet-18/1.0 -T ./examples/image_classifier/kitten.jpg
{
"tabby": 0.40966343879699707,
"tiger_cat": 0.3467043936252594,
"Egyptian_cat": 0.13002890348434448,
"lynx": 0.023919543251395226,
"bucket": 0.011532200500369072
}
更新模型
先要使用2.3.1的方式对新模型打包,注意版本号
$ curl -X POST "http://localhost:8081/models?url=/home/ubuntu/newspace/pytorchserve/deploy/model_save/resnet-18_2.mar "
2022-10-27T14:58:01,223 [DEBUG] epollEventLoopGroup-3-12 org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 2.0 for model resnet-18
2022-10-27T14:58:01,223 [INFO ] epollEventLoopGroup-3-12 org.pytorch.serve.wlm.ModelManager - Model resnet-18 loaded.
2022-10-27T14:58:01,224 [INFO ] epollEventLoopGroup-3-12 ACCESS_LOG - /127.0.0.1:38340 "POST /models?url=/home/ubuntu/newspace/pytorchserve/deploy/model_save/resnet-18_2.mar HTTP/1.1" 200 1832
2022-10-27T14:58:01,224 [INFO ] epollEventLoopGroup-3-12 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:cb,timestamp:1666849981
{
"status": "Model \"resnet-18\" Version: 2.0 registered with 0 initial workers. Use scale workers API to add workers for the model."
}
调用测试
$ curl http://127.0.0.1:8080/predictions/resnet-18/2.0 -T ./examples/image_classifier/kitten.jpg
{
"code": 503,
"type": "ServiceUnavailableException",
"message": "Model \"resnet-18\" Version 2.0\" has no worker to serve inference request. Please use scale workers API to add workers."
}
根据提示对新版本模型新增worker
$ curl -v -X PUT "http://localhost:8081/models/resnet-18/2.0?min_worker=1"
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8081 (#0)
> PUT /models/resnet-18/2.0?min_worker=1 HTTP/1.1
> Host: localhost:8081
> User-Agent: curl/7.60.0
> Accept: */*
>
< HTTP/1.1 202 Accepted
< content-type: application/json
< x-request-id: d2989b0f-f154-4056-9c58-d5f44d558cce
< Pragma: no-cache
< Cache-Control: no-cache; no-store, must-revalidate, private
< Expires: Thu, 01 Jan 1970 00:00:00 UTC
< content-length: 47
< connection: keep-alive
<
{
"status": "Processing worker updates..."
}
* Connection #0 to host localhost left intact
$ curl http://localhost:8081/models/resnet-18/2.0
[
{
"modelName": "resnet-18",
"modelVersion": "2.0",
"modelUrl": "/home/ubuntu/newspace/pytorchserve/deploy/model_save/resnet-18_2.mar",
"runtime": "python",
"minWorkers": 1,
"maxWorkers": 1,
"batchSize": 1,
"maxBatchDelay": 100,
"loadedAtStartup": true,
"workers": [
{
"id": "9003",
"startTime": "2022-10-27T15:03:22.541Z",
"status": "UNLOADING",
"memoryUsage": 0,
"pid": 19643,
"gpu": true,
"gpuUsage": "gpuId::0 utilization.gpu [%]::2 % utilization.memory [%]::0 % memory.used [MiB]::2246 MiB"
}
]
}
]
$ curl http://127.0.0.1:8080/predictions/resnet-18/2.0 -T ./examples/image_classifier/kitten.jpg
{
"tabby": 0.40966343879699707,
"tiger_cat": 0.3467043936252594,
"Egyptian_cat": 0.13002890348434448,
"lynx": 0.023919543251395226,
"bucket": 0.011532200500369072
}
在实际工程应用中,一个serving会提供多个模型的服务能力,这就需要我们在更新一个模型时,不能中断其他模型的在线推理能力。
新增一个模型tts
$ cd ./examples/text_to_speech_synthesizer/create_mar.sh
$ ./create_mar.sh
$ mv waveglow_synthesizer.mar model_store/
$ curl -X POST "http://localhost:8081/models?url=/home/ubuntu/newspace/pytorchserve/deploy/model_save/waveglow_synthesizer.mar"
$ curl http://localhost:8081/models/waveglow_synthesizer/1.0
[
{
"modelName": "waveglow_synthesizer",
"modelVersion": "1.0",
"modelUrl": "/home/ubuntu/newspace/pytorchserve/deploy/model_save/waveglow_synthesizer.mar",
"runtime": "python",
"minWorkers": 0,
"maxWorkers": 0,
"batchSize": 1,
"maxBatchDelay": 100,
"loadedAtStartup": false,
"workers": []
}
]
$ curl -v -X PUT "http://localhost:8081/models/waveglow_synthesizer/1.0?min_worker=1"
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8081 (#0)
> PUT /models/waveglow_synthesizer/1.0?min_worker=1 HTTP/1.1
> Host: localhost:8081
> User-Agent: curl/7.60.0
> Accept: */*
>
< HTTP/1.1 202 Accepted
< content-type: application/json
< x-request-id: 08ad7b99-86b5-4144-aff3-92583ed40aca
< Pragma: no-cache
< Cache-Control: no-cache; no-store, must-revalidate, private
< Expires: Thu, 01 Jan 1970 00:00:00 UTC
< content-length: 47
< connection: keep-alive
<
{
"status": "Processing worker updates..."
}
* Connection #0 to host localhost left intact
3.3 模型测试
$ curl http://127.0.0.1:8080/predictions/waveglow_synthesizer -T sample_text.txt -o audio.wav
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 25 0 0 0 25 0 0 --:--:-- 0:04:11 --:--:-- 0
100 183k 100 183k 0 25 420 0 0:07:26 0:07:25 0:00:01 47879
目录下生成tts生成的音频文件audio.wav