【Meta-Al】llama GPT 测试

2023-4-28 更新:

github有兄弟合并+量化了7B、13B的权重,Chinese-Alpaca项目部署体验更简单:

GitHub - ymcui/Chinese-LLaMA-Alpaca: 中文LLaMA&Alpaca大语言模型+本地CPU/GPU部署 (Chinese LLaMA & Alpaca LLMs)

【Meta-Al】llama GPT 测试_第1张图片

 


github地址:

GitHub - facebookresearch/llama: Inference code for LLaMA models

在 LLaMA 发布三天后,初创公司 Nebuly AI 开源了 RLHF 版 LLaMA(ChatLLaMA)的训练方法。它的训练过程类似 ChatGPT,该项目允许基于预训练的 LLaMA 模型构建 ChatGPT 形式的服务。目前知乎上已经有兄弟中文训练起来了,有按照 bert4torch 方式能比较方便快捷的进行尝试。

首先进行安装:

cd llama

pip install -r requirements.txt

pip install -e .

【Meta-Al】llama GPT 测试_第2张图片

 然后到github地址上找到申请入口,填写邮箱之后等待邮件:

【Meta-Al】llama GPT 测试_第3张图片

下载模型,执行 download.sh ,修改需要下载的模型尺寸 MODEL_SIZE,只保留7B,同时替换下载链接 PRESIGNED_URL:
PRESIGNED_URL=""             # 需要从邮箱中查找,
MODEL_SIZE="7B"  # 只下载最小
TARGET_FOLDER="./"             # 下载目录

查看下载脚本原来实际上也就是下载的下面几个文件:

wget ${PRESIGNED_URL/'*'/"tokenizer.model"} -O ${TARGET_FOLDER}"/tokenizer.model"

wget ${PRESIGNED_URL/'*'/"tokenizer_checklist.chk"} -O ${TARGET_FOLDER}"/tokenizer_checklist.chk"

wget ${PRESIGNED_URL/'*'/"${i}/consolidated.${s}.pth"} -O ${TARGET_FOLDER}"/${i}/consolidated.${s}.pth"

wget ${PRESIGNED_URL/'*'/"${i}/params.json"} -O ${TARGET_FOLDER}"/${i}/params.json"

wget ${PRESIGNED_URL/'*'/"${i}/checklist.chk"} -O ${TARGET_FOLDER}"/${i}/checklist.chk"

然后下载模型:

download.sh

如果下载失败可以参考国内教程也就是通过pyllama进行下载:

如何评价 LLaMA 模型泄露? - 知乎

ChatGPT平替模型:LLaMA(附下载地址,平民玩家和伸手党的福音!) - 知乎

推理,替换两个文件路径即可:

torchrun --nproc_per_node 1 --nnodes 1 example.py --ckpt_dir ./weight/7B --tokenizer_path ./weight/tokenizer.model

官方源码推理使用的 torchrun 针对并行推理。下面演示使用 bert4torch 对 llama-7b 进行推理(需要权重model转为bin),首先下载最新版bert4torch:

pip install git+https://www.github.com/Tongjilibo/bert4torch.git

教程如下:

#! -*- coding: utf-8 -*-
# 基本测试:llama的7b模型的测试, fp32精度的单卡占用约27g,fp16的显存占用约14g
# 使用前需要安装最新版本的bert4torch并进行权重转换 https://github.com/Tongjilibo/bert4torch/blob/master/examples/convert_script/convert_llama_facebook.py


# 0. Install lastest bert4torch: `pip install git+https://www.github.com/Tongjilibo/bert4torch.git` or git clone
# 1. Download weights:[Github](https://github.com/facebookresearch/llama) | [huggingface](https://huggingface.co/decapoda-research/llama-7b-hf) | [torrent](https://pan.baidu.com/s/1yBaYZK5LHIbJyCCbtFLW3A?pwd=phhd),本人实现是基于第三种
# 2. Convert weights:https://github.com/Tongjilibo/bert4torch/blob/master/examples/convert_script/convert_llama_facebook.py
# 3. Inference script:https://github.com/Tongjilibo/bert4torch/blob/master/examples/basic/basic_language_model_llama.py
# 4. VRAM request in single gpu:fp32 27G, fp16 14g

import torch
from bert4torch.models import build_transformer_model
from bert4torch.tokenizers import SpTokenizer
from bert4torch.snippets import AutoRegressiveDecoder

config_path = 'F:/Projects/pretrain_ckpt/llama/7B/bert4torch_config.json'
checkpoint_path = 'F:/Projects/pretrain_ckpt/llama/7B/bert4torch_pytorch_model.bin'
spm_path = 'F:/Projects/pretrain_ckpt/llama/tokenizer.model'
device = 'cuda' if torch.cuda.is_available() else 'cpu'

tokenizer = SpTokenizer(spm_path, token_start='', token_end=None, keep_accents=True)

model = build_transformer_model(config_path=config_path, checkpoint_path=checkpoint_path, model='llama').half().to(device)  # 建立模型,加载权重

class ArticleCompletion(AutoRegressiveDecoder):
    @AutoRegressiveDecoder.wraps(default_rtype='logits')
    def predict(self, inputs, output_ids, states):
        token_ids = torch.cat([inputs[0], output_ids], 1)
        logits = model.predict([token_ids])
        return logits[:, -1, :]

    def generate(self, text, n=1, topp=0.95):
        token_ids, _ = tokenizer.encode(text)
        results = self.random_sample([token_ids], n, topp=topp)  # 基于随机采样
        return [text + tokenizer.decode(ids.cpu().numpy()) for ids in results]

article_completion = ArticleCompletion(
    start_id=None,
    end_id=2,  # 标记
    maxlen=256,
    minlen=20,
    device=device
)

for text in [u'I believe the meaning of life is ']:
    print(article_completion.generate(text))

权重转换脚本:

bert4torch/convert_llama_facebook.py at master · Tongjilibo/bert4torch · GitHub

转换脚本底部有 bert4torch_config.json 文件配置:

{

   "hidden_size": 4096,

    "intermediate_size": 11008,

   "multiple_of": 256,

   "num_attention_heads": 32,

   "num_hidden_layers": 32,

   "norm_eps": 1e-06,

   "hidden_act": "silu",

   "vocab_size": 32000,

   "segment_vocab_size": 0

}

推理脚本:

bert4torch/basic_language_model_llama.py at master · Tongjilibo/bert4torch · GitHub

转换得到的bin权重:

【Meta-Al】llama GPT 测试_第4张图片

然后推理,按照注释,修改文件路径:

config_path = 'F:/Projects/pretrain_ckpt/llama/7B/bert4torch_config.json'

checkpoint_path = 'F:/Projects/pretrain_ckpt/llama/7B/bert4torch_pytorch_model.bin'

spm_path = 'F:/Projects/pretrain_ckpt/llama/tokenizer.model'

你可能感兴趣的:(llama,gpt,人工智能,chatgpt)