MindIE Service是面向通用模型场景的推理服务化框架,通过开放、可扩展的推理服务化平台架构提供推理服务化能力,支持对接业界主流推理框架接口,满足大语言模型的高性能推理需求。
MindIE Service的组件包括MindIE Service Tools、MindIE Client、MindIE MS(MindIE Management Service)和MindIE Server和,一方面通过对接昇腾推理加速引擎带来大模型在昇腾环境中的性能提升,另一方面通过接入现有主流推理框架生态,逐渐以高性能和易用性牵引用户向全原生推理服务化框架迁移。
MindIE Service不支持单独安装,需要把MindIE整个包安装,安装指南参考昇腾社区文档:MindIE安装指导
安装完成后,若显示如下信息,则说明软件安装成功:
xxx install success
xxx表示安装的实际软件包名。
MindIE Service提供了兼容TGI 0.9.4版本、vLLM 0.2.6版本、OpenAI、Triton等三方框架的接口,同时也提供原生的推理接口和相关的服务健康检查接口,详细接口列表参考:接口列表
下面提供主要接口的样例:
{
"inputs": "My name is Olivier and I",
"parameters": {
"decoder_input_details": true,
"details": true,
"do_sample": true,
"max_new_tokens": 20,
"repetition_penalty": 1.03,
"return_full_text": false,
"seed": null,
"temperature": 0.5,
"top_k": 10,
"top_p": 0.95,
"truncate": null,
"typical_p": 0.5,
"watermark": false,
"stop": null,
"adapter_id": "None"
}
}
响应体:
{
"details": {
"finish_reason": "length",
"generated_tokens": 1,
"prefill": [{
"id": 0,
"logprob":null,
"special": null,
"text": "test"
}],
"prompt_tokens": 74,
"seed": 42,
"tokens": [{
"id": 0,
"logprob": null,
"special": null,
"text": "test"
}]
},
"generated_text": "am a Frenchman living in the UK. I have been working as an IT consultant for "
}
{
"prompt": "My name is Olivier and I",
"max_tokens": 20,
"repetition_penalty": 1.03,
"presence_penalty": 1.2,
"frequency_penalty": 1.2,
"temperature": 0.5,
"top_p": 0.95,
"top_k": 10,
"seed": null,
"stream": false,
"stop": null,
"stop_token_ids": null,
"model": "None",
"include_stop_str_in_output": false,
"skip_special_tokens": true,
"ignore_eos": false
}
响应体:
{"text":["My name is Olivier and I am a Frenchman living in the UK. I am a keen photographer and"]}
{
"model": "gpt-3.5-turbo",
"messages": [{
"role": "user",
"content": "You are a helpful assistant."
}],
"stream": false,
"presence_penalty": 1.03,
"frequency_penalty": 1.0,
"repetition_penalty": 1.0,
"temperature": 0.5,
"top_p": 0.95,
"top_k": 0,
"seed": null,
"stop": ["stop1", "stop2"],
"stop_token_ids": [2, 13],
"include_stop_str_in_output": false,
"skip_special_tokens": true,
"ignore_eos": false,
"max_tokens": 20
}
响应体:
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "gpt-3.5-turbo-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "\n\nHello there, how may I assist you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 12,
"total_tokens": 21
}
}
{
"id":"a123",
"text_input": "My name is Olivier and I",
"parameters": {
"details": true,
"do_sample": true,
"max_new_tokens":20,
"repetition_penalty": 1.1,
"seed": 123,
"temperature": 1,
"top_k": 10,
"top_p": 0.99,
"batch_size":100,
"typical_p": 0.5,
"watermark": false,
"perf_stat": false,
"priority": 5,
"timeout": 10
}
}
响应体:
{
"id": "a123",
"model_name": "llama_65b",
"model_version": null,
"text_output": "am living in South of France.\nI have been addicted to Jurassic Park since very young. I played some video game versions but especially the great first pinball model from William which reminds me a lot of JPOG1 by song (deluxe). Unfortunately, it stopped working and has been unprofitable for a long time before being exchanged for another game. Fortunately there was the computer version. Nevertheless, it came out only on PC in 2003 when mine was too weak... It's just been a couple of months that the game came out on Mac (a whole 15 years late) with the Version 0.91JAMS ! I know this may be a little antique with the realistic animations and versions today, but the memories are very deep-seated . So thank you all rebuilders for keeping alive wonderful games like this one.\nSince then, I try to keep me updated about this game and test if possible later Alpha. Thank you so much for your work!",
"details": {
"finish_reason": "eos_token",
"generated_tokens": 221,
"first_token_cost": null,
"decode_cost": null
}
}
{
"inputs": "My name is Olivier and I",
"stream": false,
"parameters": {
"temperature": 0.5,
"top_k": 10,
"top_p": 0.95,
"max_new_tokens": 20,
"do_sample": true,
"seed": null,
"repetition_penalty": 1.03,
"details": true,
"typical_p": 0.5,
"watermark": false,
"priority": 5,
"timeout": 10
}
}
响应体:
{
"generated_text": "am a french native speaker. I am looking for a job in the hospitality industry. I",
"details": {
"finish_reason": "length",
"generated_tokens": 20,
"seed": 846930886
}
}
使用MindIE Benchmark工具来完成精度测试任务,更详细的MindIE Benchmark介绍参考:链接
参考样例:
benchmark \
--DatasetPath "/{数据集路径}/GSM8K" \
--DatasetType "gsm8k" \
--ModelName "baichuan2_13b" \
--ModelPath "/{模型路径}/baichuan2-13b" \
--TestType client \
--Http https://{ipAddress}:{port} \
--ManagementHttp https://{managementIpAddress}:{managementPort} \
--MaxOutputLen 512 \
--TestAccuracy True
其中–TestAccuracy True 参数是开启精度测试的开关。
使用MindIE Benchmark工具来完成性能测试任务,更详细的MindIE Benchmark介绍参考:链接
参考样例:
benchmark \
--DatasetPath "/{数据集路径}/GSM8K" \
--DatasetType "gsm8k" \
--ModelName "baichuan2_13b" \
--ModelPath "/{模型路径}/baichuan2-13b" \
--TestType client \
--Http https://{ipAddress}:{port} \
--ManagementHttp https://{managementIpAddress}:{managementPort} \
--MaxOutputLen 512
# 查看所有与mindieservice相关的进程列表
ps -ef | grep mindieservice
# 使用kill命令停止进程
kill {mindieservice_daemon 主进程ID}
pkill -9 mindieservice
时延的指标和lpct(latency per compelete token,prefill阶段平均每个token时延)、Throughput等测试吞吐量的指标。
## 服务停止
- 方式1:使用kill命令停止进程
```shell
# 查看所有与mindieservice相关的进程列表
ps -ef | grep mindieservice
# 使用kill命令停止进程
kill {mindieservice_daemon 主进程ID}
pkill -9 mindieservice