2023-4-28 更新:
github有兄弟合并+量化了7B、13B的权重,Chinese-Alpaca项目部署体验更简单:
GitHub - ymcui/Chinese-LLaMA-Alpaca: 中文LLaMA&Alpaca大语言模型+本地CPU/GPU部署 (Chinese LLaMA & Alpaca LLMs)
github地址:
GitHub - facebookresearch/llama: Inference code for LLaMA models
在 LLaMA 发布三天后,初创公司 Nebuly AI 开源了 RLHF 版 LLaMA(ChatLLaMA)的训练方法。它的训练过程类似 ChatGPT,该项目允许基于预训练的 LLaMA 模型构建 ChatGPT 形式的服务。目前知乎上已经有兄弟中文训练起来了,有按照 bert4torch 方式能比较方便快捷的进行尝试。
首先进行安装:
cd llama
pip install -r requirements.txt
pip install -e .
然后到github地址上找到申请入口,填写邮箱之后等待邮件:
下载模型,执行 download.sh ,修改需要下载的模型尺寸 MODEL_SIZE,只保留7B,同时替换下载链接 PRESIGNED_URL:
PRESIGNED_URL="" # 需要从邮箱中查找,
MODEL_SIZE="7B" # 只下载最小
TARGET_FOLDER="./" # 下载目录
查看下载脚本原来实际上也就是下载的下面几个文件:
wget ${PRESIGNED_URL/'*'/"tokenizer.model"} -O ${TARGET_FOLDER}"/tokenizer.model"
wget ${PRESIGNED_URL/'*'/"tokenizer_checklist.chk"} -O ${TARGET_FOLDER}"/tokenizer_checklist.chk"
wget ${PRESIGNED_URL/'*'/"${i}/consolidated.${s}.pth"} -O ${TARGET_FOLDER}"/${i}/consolidated.${s}.pth"
wget ${PRESIGNED_URL/'*'/"${i}/params.json"} -O ${TARGET_FOLDER}"/${i}/params.json"
wget ${PRESIGNED_URL/'*'/"${i}/checklist.chk"} -O ${TARGET_FOLDER}"/${i}/checklist.chk"
然后下载模型:
download.sh
如果下载失败可以参考国内教程也就是通过pyllama进行下载:
如何评价 LLaMA 模型泄露? - 知乎
ChatGPT平替模型:LLaMA(附下载地址,平民玩家和伸手党的福音!) - 知乎
推理,替换两个文件路径即可:
torchrun --nproc_per_node 1 --nnodes 1 example.py --ckpt_dir ./weight/7B --tokenizer_path ./weight/tokenizer.model
官方源码推理使用的 torchrun 针对并行推理。下面演示使用 bert4torch 对 llama-7b 进行推理(需要权重model转为bin),首先下载最新版bert4torch:
pip install git+https://www.github.com/Tongjilibo/bert4torch.git
教程如下:
#! -*- coding: utf-8 -*-
# 基本测试:llama的7b模型的测试, fp32精度的单卡占用约27g,fp16的显存占用约14g
# 使用前需要安装最新版本的bert4torch并进行权重转换 https://github.com/Tongjilibo/bert4torch/blob/master/examples/convert_script/convert_llama_facebook.py
# 0. Install lastest bert4torch: `pip install git+https://www.github.com/Tongjilibo/bert4torch.git` or git clone
# 1. Download weights:[Github](https://github.com/facebookresearch/llama) | [huggingface](https://huggingface.co/decapoda-research/llama-7b-hf) | [torrent](https://pan.baidu.com/s/1yBaYZK5LHIbJyCCbtFLW3A?pwd=phhd),本人实现是基于第三种
# 2. Convert weights:https://github.com/Tongjilibo/bert4torch/blob/master/examples/convert_script/convert_llama_facebook.py
# 3. Inference script:https://github.com/Tongjilibo/bert4torch/blob/master/examples/basic/basic_language_model_llama.py
# 4. VRAM request in single gpu:fp32 27G, fp16 14g
import torch
from bert4torch.models import build_transformer_model
from bert4torch.tokenizers import SpTokenizer
from bert4torch.snippets import AutoRegressiveDecoder
config_path = 'F:/Projects/pretrain_ckpt/llama/7B/bert4torch_config.json'
checkpoint_path = 'F:/Projects/pretrain_ckpt/llama/7B/bert4torch_pytorch_model.bin'
spm_path = 'F:/Projects/pretrain_ckpt/llama/tokenizer.model'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
tokenizer = SpTokenizer(spm_path, token_start='', token_end=None, keep_accents=True)
model = build_transformer_model(config_path=config_path, checkpoint_path=checkpoint_path, model='llama').half().to(device) # 建立模型,加载权重
class ArticleCompletion(AutoRegressiveDecoder):
@AutoRegressiveDecoder.wraps(default_rtype='logits')
def predict(self, inputs, output_ids, states):
token_ids = torch.cat([inputs[0], output_ids], 1)
logits = model.predict([token_ids])
return logits[:, -1, :]
def generate(self, text, n=1, topp=0.95):
token_ids, _ = tokenizer.encode(text)
results = self.random_sample([token_ids], n, topp=topp) # 基于随机采样
return [text + tokenizer.decode(ids.cpu().numpy()) for ids in results]
article_completion = ArticleCompletion(
start_id=None,
end_id=2, # 标记
maxlen=256,
minlen=20,
device=device
)
for text in [u'I believe the meaning of life is ']:
print(article_completion.generate(text))
权重转换脚本:
bert4torch/convert_llama_facebook.py at master · Tongjilibo/bert4torch · GitHub
转换脚本底部有 bert4torch_config.json 文件配置:
{
"hidden_size": 4096,
"intermediate_size": 11008,
"multiple_of": 256,
"num_attention_heads": 32,
"num_hidden_layers": 32,
"norm_eps": 1e-06,
"hidden_act": "silu",
"vocab_size": 32000,
"segment_vocab_size": 0
}
推理脚本:
bert4torch/basic_language_model_llama.py at master · Tongjilibo/bert4torch · GitHub
转换得到的bin权重:
然后推理,按照注释,修改文件路径:
config_path = 'F:/Projects/pretrain_ckpt/llama/7B/bert4torch_config.json'
checkpoint_path = 'F:/Projects/pretrain_ckpt/llama/7B/bert4torch_pytorch_model.bin'
spm_path = 'F:/Projects/pretrain_ckpt/llama/tokenizer.model'