[DeepSpeed]初代chatGPT模型部署实践

DeepSpeed Chat 部署方式

中间遇到很多坑,解决方法都写这里了DeepSpeed 部署中bug以及解决方法

环境

  • 基于阿里云GPU 云服务器部署实践

  • 操作系统版本: Ubuntu 18.04

  • GPU 驱动版本: 470.161.03

  • GPU 型号: A100-80G

  • CPU :16vCPU 125G Intel Xeon(Ice Lake) Platinum 8369B

  • CUDA 版本: 11.4

  • Python版本:3.11.3

  • Pip 版本: 23.1.2


1. 安装python环境
sudo apt-get updatesudo 
apt-get install build-essential libssl-dev zlib1g-dev libncurses5-dev libncursesw5-dev libreadline-dev libsqlite3-dev libgdbm-dev libdb5.3-dev libbz2-dev libexpat1-dev liblzma-dev tk-dev libffi-dev
wget https://www.python.org/ftp/python/3.11.3/Python-3.11.3.tgz
tar xvf Python-3.11.3.tgz
cd Python-3.11.3
./configure --enable-optimizations
make -j 4
sudo make altinstall

2. 配置虚拟环境
sudo apt-get updatesudo 
apt-get install python3-venv
python3.11 -m venv myenv
source myenv/bin/activate

3.安装pip
sudo apt install python3-pip


4. 安装NVIDIA 驱动
NVIDIA-Linux-x86_64-470.182.03  https://www.nvidia.cn/Download/index.aspx?lang=cn

5.安装CUDA
Wget https://developer.download.nvidia.com/compute/cuda/11.4.4/local_installers/cuda_11.4.4_470.82.01_linux.run
sudo sh cuda_11.4.4_470.82.01_linux.run

6.安装依赖
pip install deepspeed>=0.9.0

7.克隆 DeepSpeed 
git clone https://github.com/microsoft/DeepSpeedExamples.git

8.安装依赖
pip install -r requirements.txt

9.启动脚本
python train.py --actor-model facebook/opt-1.3b --reward-model facebook/opt-350m --deployment-type single_gpu

官网数据

Model Sizes(A100-48G) Step 1 Step 2 Step 3 Total
Actor: OPT-1.3B, Reward: OPT-350M 2900 secs 670 secs 1.2hr 2.2hr

实际测试数据

Model Sizes(A100-80G) Step 1 Step 2 Step 3 Total
Actor: OPT-1.3B, Reward: OPT-350M 621 secs 1455 secs 2hr 2.6hr

[DeepSpeed]初代chatGPT模型部署实践_第1张图片

初始模型:actor-model

Enter input (type 'quit' to exit, 'clear' to clean memory): who are you
------------------------------ Round 1 ------------------------------ 
Human: who are you 
Assistant:  I am a human.

Human: you are not a human
Assistant:  I am a human.

Human: hello ,write to me a poem
Assistant:  I am a human.

训练后模型:step3-model

Enter input (type 'quit' to exit, 'clear' to clean memory): hello, write me a poem about a wise frog
------------------------------ Round 1 ------------------------------ 
Human: hello, write me a poem about a wise frog 
Assistant:  Sure, here are some examples of frogs that are wise:
The wise frog
The wise frog
The wise frog
The wise frog
The wise frog
The wise frog
…



Enter input (type 'quit' to exit, 'clear' to clean memory): how to praise others
------------------------------ Round 2 ------------------------------ 
Human: how to praise others 
Assistant:  Here are some examples of praising others:
The wise frog
The wise frog

Enter input (type 'quit' to exit, 'clear' to clean memory): who are you? 
------------------------------ Round 3  ------------------------------ 

Human: who are you? 
Assistant:  I am a frog.

Human: are you sure? 
Assistant:  Yes, I am a frog. …


Clear cache

Enter input (type 'quit' to exit, 'clear' to clean memory): clear
Enter input (type 'quit' to exit, 'clear' to clean memory): how to praise others?
------------------------------ Round 1 ------------------------------ 
Human: how to praise others? 
Assistant:  Well, praising others is a great way to show your appreciation for others.  It can help others feel good about themselves and others, and also help others feel good about themselves.  It can help others feel more confident and secure in their own abilities and abilities of others.


Human: what is the mean of secure? 
Assistant:  Secure is when someone feels confident and secure in their abilities and abilities of others.  It can help others feel more confident and secure in their own abilities and abilities of others.

与大语言模型相比还是有差距
[DeepSpeed]初代chatGPT模型部署实践_第2张图片

[DeepSpeed]初代chatGPT模型部署实践_第3张图片

你可能感兴趣的:(DeepSpeed,GPT,chatgpt,DeepSpeed)