luoganttcc

使用DeepSpeed/P-Tuning v2对ChatGLM-6B进行微调

link

之前尝试了基于ChatGLM-6B使用LoRA进行参数高效微调，本文给大家分享使用DeepSpeed和P-Tuning v2对ChatGLM-6B进行微调，相关代码放置在GitHub上面：llm-action。

ChatGLM-6B简介

ChatGLM-6B相关的简介请查看之前的文章，这里不再赘述。

P-Tuning v2简介

P-Tuning是一种较新的模型微调方法，它采用了参数剪枝的技术，可以将微调的参数量减少到原来的0.1%。具体来说，P-Tuning v2是基于P-Tuning v1的升级版，主要的改进在于采用了更加高效的剪枝方法，可以进一步减少模型微调的参数量。

P-Tuning v2的原理是通过对已训练好的大型语言模型进行参数剪枝，得到一个更加小巧、效率更高的轻量级模型。具体地，P-Tuning v2首先使用一种自适应的剪枝策略，对大型语言模型中的参数进行裁剪，去除其中不必要的冗余参数。然后，对于被剪枝的参数，P-Tuning v2使用了一种特殊的压缩方法，能够更加有效地压缩参数大小，并显著减少模型微调的总参数量。

总的来说，P-Tuning v2的核心思想是让模型变得更加轻便、更加高效，同时尽可能地保持模型的性能不受影响。这不仅可以加快模型的训练和推理速度，还可以减少模型在使用过程中的内存和计算资源消耗，让模型更适用于各种实际应用场景中。

环境搭建

基础环境配置如下：

操作系统: Ubuntu 18.04
CPUs: 单个节点具有 1TB 内存的 Intel CPU，物理CPU个数为64，每颗CPU核数为16
GPUs: 8 卡 A800 80GB GPUs
Python: 3.10 (需要先升级OpenSSL到1.1.1t版本（点击下载OpenSSL），然后再编译安装Python)，点击下载Python
NVIDIA驱动程序版本: 515.65.01，根据不同型号选择不同的驱动程序，点击下载。
CUDA工具包: 11.7，点击下载
NCCL: nccl_2.14.3-1+cuda11.7，点击下载
cuDNN: 8.8.1.3_cuda11，点击下载

上面的NVIDIA驱动、CUDA、Python等工具的安装就不一一赘述了。

创建虚拟环境并激活虚拟环境chatglm-ptuningv2-venv-py310-cu117：

cd /home/guodong.li/virtual-venv
virtualenv -p /usr/bin/python3.10 chatglm-ptuningv2-venv-py310-cu117
source /home/guodong.li/virtual-venv/chatglm-ptuningv2-venv-py310-cu117/bin/activate

离线安装PyTorch，点击下载对应cuda版本的torch和torchvision即可。

pip install torch-1.13.1+cu117-cp310-cp310-linux_x86_64.whl
pip install torchvision-0.14.1+cu117-cp310-cp310-linux_x86_64.whl

安装其他依赖库。

pip install -r requirements.txt

requirements.txt文件内容如下：

protobuf
transformers==4.28.0
cpm_kernels
gradio
mdtex2html
sentencepiece
rouge_chinese
nltk
jieba
datasets
deepspeed
accelerate

注意：
官方文档的transformers版本为4.27.1，chatglm加载模型时会调用transformers/dynamic_module_utils.py文件下的get_class_in_module方法，而该方法在并发情况下会存在找不到文件的问题。将transformers版本升级到4.28.0可以规避此问题。

数据准备

下面以 ADGEN (广告生成) 数据集为例来介绍微调的具体使用。

ADGEN 数据集为根据输入（content）生成一段广告词（summary），具体格式如下所示：

{
    "content": "类型#上衣*版型#宽松*版型#显瘦*图案#线条*衣样式#衬衫*衣袖型#泡泡袖*衣款式#抽绳",
    "summary": "这件衬衫的款式非常的宽松，利落的线条可以很好的隐藏身材上的小缺点，穿在身上有着很好的显瘦效果。领口装饰了一个可爱的抽绳，漂亮的绳结展现出了十足的个性，配合时尚的泡泡袖型，尽显女性甜美可爱的气息。"
}

请从官网下载 ADGEN 数据集，同通过此链接下载，并将其解压到 AdvertiseGen 目录。

tar -zxvf AdvertiseGen.tar.gz

查看数据集大小：

> wc -l AdvertiseGen/*
> 1070 AdvertiseGen/dev.json
> 114599 AdvertiseGen/train.json
> 115669 total

使用DeepSpeed DP+Zero对ChatGLM-6B进行全参数微调

首先，我们使用DeepSpeed对ChatGLM-6B进行全参数微调。

首先，下载源代码，为确保代码的一致性切换到对应的commitid：

git clone https://github.com/THUDM/ChatGLM-6B.git
cd ChatGLM-6B
git checkout 8633db1
cd ptuning

修改ds_train_finetune.sh脚本使用DeepSpeed进行全参数微调。

LR=1e-4

MASTER_PORT=$(shuf -n 1 -i 10000-65535)

deepspeed --num_gpus=8 --master_port $MASTER_PORT main.py \ --deepspeed deepspeed.json \ --do_train \ --train_file /data/nfs/llm/data/AdvertiseGen/train.json \ --test_file /data/nfs/llm/data/AdvertiseGen/dev.json \ --prompt_column content \ --response_column summary \ --overwrite_cache \ --model_name_or_path /data/nfs/llm/model/chatglm-6b \ --output_dir /home/guodong.li/output/adgen-chatglm-6b-ft-$ LR
–overwrite_output_dir
–max_source_length 64
–max_target_length 64
–per_device_train_batch_size 24
–per_device_eval_batch_size 1
–gradient_accumulation_steps 2
–predict_with_generate
–num_train_epochs 2
–logging_steps 10
–save_steps 300
–learning_rate $LR
–fp16

运行过程：

> sh ds_train_finetune.sh

[2023-04-14 18:01:33,206] [WARNING] [runner.py:190:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.

[2023-04-14 18:01:33,417] [INFO] [runner.py:540:main] cmd = /home/guodong.li/virtual-venv/chatglm-ptuningv2-venv-py310-cu117/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=44148 --enable_each_rank_log=None main.py --deepspeed deepspeed.json --do_train --train_file /data/nfs/llm/data/AdvertiseGen/train.json --test_file /data/nfs/llm/data/AdvertiseGen/dev.json --prompt_column content --response_column summary --overwrite_cache --model_name_or_path /data/nfs/llm/model/chatglm-6b --output_dir /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4 --overwrite_output_dir --max_source_length 64 --max_target_length 64 --per_device_train_batch_size 24 --per_device_eval_batch_size 1 --gradient_accumulation_steps 2 --predict_with_generate --num_train_epochs 2 --logging_steps 10 --save_steps 300 --learning_rate 1e-4 --fp16

[2023-04-14 18:01:35,945] [INFO] [launch.py:222:main] 0 NCCL_SOCKET_IFNAME=bond0

[2023-04-14 18:01:35,945] [INFO] [launch.py:222:main] 0 NCCL_IB_DISABLE=1

[2023-04-14 18:01:35,945] [INFO] [launch.py:229:main] WORLD INFO DICT: {‘localhost’: [0, 1, 2, 3, 4, 5, 6, 7]}

[2023-04-14 18:01:35,945] [INFO] [launch.py:235:main] nnodes=1, num_local_procs=8, node_rank=0

[2023-04-14 18:01:35,945] [INFO] [launch.py:246:main] global_rank_mapping=defaultdict(, {‘localhost’: [0, 1, 2, 3, 4, 5, 6, 7]})

[2023-04-14 18:01:35,945] [INFO] [launch.py:247:main] dist_world_size=8

[2023-04-14 18:01:35,945] [INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7

[2023-04-14 18:01:40,133] [INFO] [comm.py:586:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl

04/14/2023 18:01:41 - WARNING - main - Process rank: 2, device: cuda:2, n_gpu: 1distributed training: True, 16-bits training: True

…

04/14/2023 18:01:41 - WARNING - main - Process rank: 5, device: cuda:5, n_gpu: 1distributed training: True, 16-bits training: True

04/14/2023 18:01:41 - INFO - main - Training/evaluation parameters Seq2SeqTrainingArguments(

_n_gpu=1,

adafactor=False,

adam_beta1=0.9,

adam_beta2=0.999,

adam_epsilon=1e-08,

auto_find_batch_size=False,

bf16=False,

bf16_full_eval=False,

data_seed=None,

dataloader_drop_last=False,

dataloader_num_workers=0,

dataloader_pin_memory=True,

ddp_bucket_cap_mb=None,

ddp_find_unused_parameters=None,

ddp_timeout=1800,

debug=[],

deepspeed=deepspeed.json,

disable_tqdm=False,

do_eval=False,

do_predict=False,

do_train=True,

eval_accumulation_steps=None,

eval_delay=0,

eval_steps=None,

evaluation_strategy=no,

fp16=True,

fp16_backend=auto,

fp16_full_eval=False,

fp16_opt_level=O1,

fsdp=[],

fsdp_config={‘fsdp_min_num_params’: 0, ‘xla’: False, ‘xla_fsdp_grad_ckpt’: False},

fsdp_min_num_params=0,

fsdp_transformer_layer_cls_to_wrap=None,

full_determinism=False,

generation_config=None,

generation_max_length=None,

generation_num_beams=None,

gradient_accumulation_steps=2,

gradient_checkpointing=False,

greater_is_better=None,

group_by_length=False,

half_precision_backend=auto,

hub_model_id=None,

hub_private_repo=False,

hub_strategy=every_save,

hub_token=,

ignore_data_skip=False,

include_inputs_for_metrics=False,

jit_mode_eval=False,

label_names=None,

label_smoothing_factor=0.0,

learning_rate=0.0001,

length_column_name=length,

load_best_model_at_end=False,

local_rank=0,

log_level=passive,

log_level_replica=warning,

log_on_each_node=True,

logging_dir=/home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/runs/Apr14_18-01-40_ai-app-2-46,

logging_first_step=False,

logging_nan_inf_filter=True,

logging_steps=10,

logging_strategy=steps,

lr_scheduler_type=linear,

max_grad_norm=1.0,

max_steps=-1,

metric_for_best_model=None,

mp_parameters=,

no_cuda=False,

num_train_epochs=2.0,

optim=adamw_hf,

optim_args=None,

output_dir=/home/guodong.li/output/adgen-chatglm-6b-ft-1e-4,

overwrite_output_dir=True,

past_index=-1,

per_device_eval_batch_size=1,

per_device_train_batch_size=24,

predict_with_generate=True,

prediction_loss_only=False,

push_to_hub=False,

push_to_hub_model_id=None,

push_to_hub_organization=None,

push_to_hub_token=,

ray_scope=last,

remove_unused_columns=True,

report_to=[],

resume_from_checkpoint=None,

run_name=/home/guodong.li/output/adgen-chatglm-6b-ft-1e-4,

save_on_each_node=False,

save_safetensors=False,

save_steps=300,

save_strategy=steps,

save_total_limit=None,

seed=42,

sharded_ddp=[],

skip_memory_metrics=True,

sortish_sampler=False,

tf32=None,

torch_compile=False,

torch_compile_backend=None,

torch_compile_mode=None,

torchdynamo=None,

tpu_metrics_debug=False,

tpu_num_cores=None,

use_ipex=False,

use_legacy_prediction_loop=False,

use_mps_device=False,

warmup_ratio=0.0,

warmup_steps=0,

weight_decay=0.0,

xpu_backend=None,

)

04/14/2023 18:03:01 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-386448e4f2983a9a/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 184.03it/s]

04/14/2023 18:03:01 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-386448e4f2983a9a/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)

[WARNING|configuration_auto.py:925] 2023-04-14 18:03:01,664 >> Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.

04/14/2023 18:03:01 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-386448e4f2983a9a/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)

0%|                                                                                                                                                                                   | 0/2 [00:00> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 240.57it/s]

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 197.48it/s]

[INFO|configuration_utils.py:666] 2023-04-14 18:03:01,678 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json

[WARNING|configuration_auto.py:925] 2023-04-14 18:03:01,678 >> Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.

[WARNING|configuration_auto.py:925] 2023-04-14 18:03:01,679 >> Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.

[INFO|configuration_utils.py:666] 2023-04-14 18:03:01,685 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json

04/14/2023 18:03:01 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-386448e4f2983a9a/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)

[INFO|configuration_utils.py:720] 2023-04-14 18:03:01,687 >> Model config ChatGLMConfig {

“_name_or_path”: “/data/nfs/llm/model/chatglm-6b”,

“architectures”: [

“ChatGLMModel”

],

“auto_map”: {

“AutoConfig”: “configuration_chatglm.ChatGLMConfig”,

“AutoModel”: “modeling_chatglm.ChatGLMForConditionalGeneration”,

“AutoModelForSeq2SeqLM”: “modeling_chatglm.ChatGLMForConditionalGeneration”

},

“bos_token_id”: 130004,

“eos_token_id”: 130005,

“gmask_token_id”: 130001,

“hidden_size”: 4096,

“inner_hidden_size”: 16384,

“layernorm_epsilon”: 1e-05,

“mask_token_id”: 130000,

“max_sequence_length”: 2048,

“model_type”: “chatglm”,

“num_attention_heads”: 32,

“num_layers”: 28,

“pad_token_id”: 3,

“position_encoding_2d”: true,

“pre_seq_len”: null,

“prefix_projection”: false,

“quantization_bit”: 0,

“torch_dtype”: “float16”,

“transformers_version”: “4.28.0”,

“use_cache”: true,

“vocab_size”: 130528

}

0%| | 0/2 [00:00> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
[WARNING|tokenization_auto.py:675] 2023-04-14 18:03:01,689 >> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
[INFO|tokenization_utils_base.py:1807] 2023-04-14 18:03:01,694 >> loading file ice_text.model
[INFO|tokenization_utils_base.py:1807] 2023-04-14 18:03:01,694 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:1807] 2023-04-14 18:03:01,694 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:1807] 2023-04-14 18:03:01,694 >> loading file tokenizer_config.json
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 285.37it/s]
[INFO|modeling_utils.py:2531] 2023-04-14 18:03:01,992 >> loading weights file /data/nfs/llm/model/chatglm-6b/pytorch_model.bin.index.json
[INFO|configuration_utils.py:575] 2023-04-14 18:03:01,993 >> Generate config GenerationConfig {
“_from_model_config”: true,
“bos_token_id”: 130004,
“eos_token_id”: 130005,
“pad_token_id”: 3,
“transformers_version”: “4.28.0”
}

Loading checkpoint shards: 0%| | 0/8 [00:00> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
[WARNING|auto_factory.py:456] 2023-04-14 18:03:02,109 >> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:13<00:00, 1.70s/it]
[INFO|modeling_utils.py:3190] 2023-04-14 18:03:15,622 >> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration.

[INFO|modeling_utils.py:3198] 2023-04-14 18:03:15,622 >> All the weights of ChatGLMForConditionalGeneration were initialized from the model checkpoint at /data/nfs/llm/model/chatglm-6b.
If your task is similar to the task the model of the checkpoint was trained on, you can already use ChatGLMForConditionalGeneration for predictions without further training.
Loading checkpoint shards: 25%|████████████████████████████████████ | 2/8 [00:13<00:40, 6.73s/it][INFO|modeling_utils.py:2839] 2023-04-14 18:03:15,703 >> Generation config file not found, using a generation config created from the model config.
…
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:34<00:00, 4.32s/it]
input_ids [5, 65421, 61, 67329, 32, 98339, 61, 72043, 32, 65347, 61, 70872, 32, 69768, 61, 68944, 32, 67329, 64103, 61, 96914, 130001, 130004, 5, 87052, 96914, 81471, 64562, 65759, 64493, 64988, 6, 65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65544, 6, 71964, 70533, 64417, 63862, 89978, 63991, 63823, 77284, 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 63893, 130005, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
inputs 类型#裤版型#宽松风格#性感图案#线条裤型#阔腿裤宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长2米的效果宽松的裤腿,当然是遮肉小能手啊。上身随性自然不拘束,面料亲肤舒适贴身体验感棒棒哒。系带部分增加设计看点,还
…
label_ids [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 130004, 5, 87052, 96914, 81471, 64562, 65759, 64493, 64988, 6, 65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65544, 6, 71964, 70533, 64417, 63862, 89978, 63991, 63823, 77284, 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 63893, 130005, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100]
labels 宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长2米的效果宽松的裤腿,当然是遮肉小能手啊。上身随性自然不拘束,面料亲肤舒适贴身体验感棒棒哒。系带部分增加设计看点,还
[2023-04-14 18:06:30,469] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2023-04-14 18:06:30,470] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no ‘params’ in the client Optimizer
[2023-04-14 18:06:30,470] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2023-04-14 18:06:30,483] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW
[2023-04-14 18:06:30,484] [INFO] [utils.py:51:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type=
[2023-04-14 18:06:30,484] [WARNING] [engine.py:1118:_do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution *****
[2023-04-14 18:06:30,484] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer
[2023-04-14 18:06:30,484] [INFO] [stage_1_and_2.py:133:init] Reduce bucket size 500000000
[2023-04-14 18:06:30,484] [INFO] [stage_1_and_2.py:134:init] Allgather bucket size 500000000
[2023-04-14 18:06:30,484] [INFO] [stage_1_and_2.py:135:init] CPU Offload: False
[2023-04-14 18:06:30,484] [INFO] [stage_1_and_2.py:136:init] Round robin gradient partitioning: False
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
Emitting ninja build file /home/guodong.li/.cache/torch_extensions/py310_cu117/utils/build.ninja…
Building extension module utils…
Allowing ninja to set a default number of workers… (overridable by setting the environment variable MAX_JOBS=N)
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
ninja: no work to do.
Loading extension module utils…
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
Time to load utils op: 0.10171675682067871 seconds
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
Emitting ninja build file /home/guodong.li/.cache/torch_extensions/py310_cu117/utils/build.ninja…
Building extension module utils…
Allowing ninja to set a default number of workers… (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module utils…
Time to load utils op: 0.18768668174743652 seconds
…
Loading extension module utils…
Time to load utils op: 0.3021426200866699 seconds
Rank: 2 partition count [8, 8] and sizes[(771473408, False), (187392, False)]
…
Rank: 4 partition count [8, 8] and sizes[(771473408, False), (187392, False)]
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
No modifications detected for re-loaded extension module utils, skipping build step…
Loading extension module utils…
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
Time to load utils op: 0.0005774497985839844 seconds
…
No modifications detected for re-loaded extension module utils, skipping build step…
Loading extension module utils…
Time to load utils op: 0.0011382102966308594 seconds
[2023-04-14 18:06:48,321] [INFO] [utils.py:785:see_memory_usage] Before initializing optimizer states
[2023-04-14 18:06:48,321] [INFO] [utils.py:786:see_memory_usage] MA 14.37 GB Max_MA 14.37 GB CA 14.39 GB Max_CA 14 GB
[2023-04-14 18:06:48,322] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 50.56 GB, percent = 5.0%
04/14/2023 18:06:48 - WARNING - transformers_modules.chatglm-6b.modeling_chatglm - use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False…
…
04/14/2023 18:06:48 - WARNING - transformers_modules.chatglm-6b.modeling_chatglm - use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False…
[2023-04-14 18:06:48,431] [INFO] [utils.py:785:see_memory_usage] After initializing optimizer states
[2023-04-14 18:06:48,434] [INFO] [utils.py:786:see_memory_usage] MA 20.12 GB Max_MA 25.87 GB CA 25.9 GB Max_CA 26 GB
[2023-04-14 18:06:48,435] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 50.84 GB, percent = 5.0%
[2023-04-14 18:06:48,435] [INFO] [stage_1_and_2.py:489:init] optimizer state initialized
[2023-04-14 18:06:48,512] [INFO] [utils.py:785:see_memory_usage] After initializing ZeRO optimizer
[2023-04-14 18:06:48,513] [INFO] [utils.py:786:see_memory_usage] MA 20.12 GB Max_MA 20.12 GB CA 25.9 GB Max_CA 26 GB
[2023-04-14 18:06:48,513] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 51.29 GB, percent = 5.1%
[2023-04-14 18:06:48,515] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = AdamW
[2023-04-14 18:06:48,515] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2023-04-14 18:06:48,515] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler =
[2023-04-14 18:06:48,515] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0001, 0.0001], mom=[(0.9, 0.999), (0.9, 0.999)]
[2023-04-14 18:06:48,515] [INFO] [config.py:953:print] DeepSpeedEngine configuration:
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] activation_checkpointing_config {
“partition_activations”: false,
“contiguous_memory_optimization”: false,
“cpu_checkpointing”: false,
“number_checkpoints”: null,
“synchronize_checkpoint_boundary”: false,
“profile”: false
}
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] aio_config … {‘block_size’: 1048576, ‘queue_depth’: 8, ‘thread_count’: 1, ‘single_submit’: False, ‘overlap_events’: True}
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] amp_enabled … False
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] amp_params … False
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] autotuning_config … {
“enabled”: false,
“start_step”: null,
“end_step”: null,
“metric_path”: null,
“arg_mappings”: null,
“metric”: “throughput”,
“model_info”: null,
“results_dir”: “autotuning_results”,
“exps_dir”: “autotuning_exps”,
“overwrite”: true,
“fast”: true,
“start_profile_step”: 3,
“end_profile_step”: 5,
“tuner_type”: “gridsearch”,
“tuner_early_stopping”: 5,
“tuner_num_trials”: 50,
“model_info_path”: null,
“mp_size”: 1,
“max_train_batch_size”: null,
“min_train_batch_size”: 1,
“max_train_micro_batch_size_per_gpu”: 1.024000e+03,
“min_train_micro_batch_size_per_gpu”: 1,
“num_tuning_micro_batch_sizes”: 3
}
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] bfloat16_enabled … False
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] checkpoint_parallel_write_pipeline False
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] checkpoint_tag_validation_enabled True
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] checkpoint_tag_validation_fail False
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] comms_config …
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] communication_data_type … None
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] compression_config … {‘weight_quantization’: {‘shared_parameters’: {‘enabled’: False, ‘quantizer_kernel’: False, ‘schedule_offset’: 0, ‘quantize_groups’: 1, ‘quantize_verbose’: False, ‘quantization_type’: ‘symmetric’, ‘quantize_weight_in_forward’: False, ‘rounding’: ‘nearest’, ‘fp16_mixed_quantize’: False, ‘quantize_change_ratio’: 0.001}, ‘different_groups’: {}}, ‘activation_quantization’: {‘shared_parameters’: {‘enabled’: False, ‘quantization_type’: ‘symmetric’, ‘range_calibration’: ‘dynamic’, ‘schedule_offset’: 1000}, ‘different_groups’: {}}, ‘sparse_pruning’: {‘shared_parameters’: {‘enabled’: False, ‘method’: ‘l1’, ‘schedule_offset’: 1000}, ‘different_groups’: {}}, ‘row_pruning’: {‘shared_parameters’: {‘enabled’: False, ‘method’: ‘l1’, ‘schedule_offset’: 1000}, ‘different_groups’: {}}, ‘head_pruning’: {‘shared_parameters’: {‘enabled’: False, ‘method’: ‘topk’, ‘schedule_offset’: 1000}, ‘different_groups’: {}}, ‘channel_pruning’: {‘shared_parameters’: {‘enabled’: False, ‘method’: ‘l1’, ‘schedule_offset’: 1000}, ‘different_groups’: {}}, ‘layer_reduction’: {‘enabled’: False}}
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] curriculum_enabled_legacy … False
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] curriculum_params_legacy … False
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] data_efficiency_config … {‘enabled’: False, ‘seed’: 1234, ‘data_sampling’: {‘enabled’: False, ‘num_epochs’: 1000, ‘num_workers’: 0, ‘curriculum_learning’: {‘enabled’: False}}, ‘data_routing’: {‘enabled’: False, ‘random_ltd’: {‘enabled’: False, ‘layer_token_lr_schedule’: {‘enabled’: False}}}}
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] data_efficiency_enabled … False
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] dataloader_drop_last … False
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] disable_allgather … False
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] dump_state … False
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] dynamic_loss_scale_args … {‘init_scale’: 65536, ‘scale_window’: 1000, ‘delayed_shift’: 2, ‘min_scale’: 1}
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] eigenvalue_enabled … False
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] eigenvalue_gas_boundary_resolution 1
[2023-04-14 18:06:48,516] [INFO] [config.py:957:print] eigenvalue_layer_name … bert.encoder.layer
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] eigenvalue_layer_num … 0
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] eigenvalue_max_iter … 100
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] eigenvalue_stability … 1e-06
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] eigenvalue_tol … 0.01
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] eigenvalue_verbose … False
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] elasticity_enabled … False
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] flops_profiler_config … {
“enabled”: false,
“profile_step”: 1,
“module_depth”: -1,
“top_modules”: 1,
“detailed”: true,
“output_file”: null
}
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] fp16_auto_cast … False
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] fp16_enabled … True
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] fp16_master_weights_and_gradients False
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] global_rank … 0
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] grad_accum_dtype … None
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] gradient_accumulation_steps … 1
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] gradient_clipping … 0.0
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] gradient_predivide_factor … 1.0
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] hybrid_engine … enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] initial_dynamic_scale … 65536
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] load_universal_checkpoint … False
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] loss_scale … 0
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] memory_breakdown … False
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] monitor_config … tensorboard=TensorBoardConfig(enabled=False, output_path=‘’, job_name=‘DeepSpeedJobName’) wandb=WandbConfig(enabled=False, group=None, team=None, project=‘deepspeed’) csv_monitor=CSVConfig(enabled=False, output_path=‘’, job_name=‘DeepSpeedJobName’) enabled=False
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] nebula_config … {
“enabled”: false,
“persistent_storage_path”: null,
“persistent_time_interval”: 100,
“num_of_version_in_retention”: 2,
“enable_nebula_load”: true,
“load_path”: null
}
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] optimizer_legacy_fusion … False
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] optimizer_name … None
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] optimizer_params … None
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] pipeline … {‘stages’: ‘auto’, ‘partition’: ‘best’, ‘seed_layers’: False, ‘activation_checkpoint_interval’: 0}
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] pld_enabled … False
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] pld_params … False
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] prescale_gradients … False
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] scheduler_name … None
[2023-04-14 18:06:48,517] [INFO] [config.py:957:print] scheduler_params … None
[2023-04-14 18:06:48,518] [INFO] [config.py:957:print] sparse_attention … None
[2023-04-14 18:06:48,518] [INFO] [config.py:957:print] sparse_gradients_enabled … False
[2023-04-14 18:06:48,518] [INFO] [config.py:957:print] steps_per_print … 10
[2023-04-14 18:06:48,518] [INFO] [config.py:957:print] train_batch_size … 192
[2023-04-14 18:06:48,518] [INFO] [config.py:957:print] train_micro_batch_size_per_gpu 24
[2023-04-14 18:06:48,518] [INFO] [config.py:957:print] use_node_local_storage … False
[2023-04-14 18:06:48,518] [INFO] [config.py:957:print] wall_clock_breakdown … False
[2023-04-14 18:06:48,518] [INFO] [config.py:957:print] world_size … 8
[2023-04-14 18:06:48,518] [INFO] [config.py:957:print] zero_allow_untested_optimizer True
[2023-04-14 18:06:48,518] [INFO] [config.py:957:print] zero_config … stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False memory_efficient_linear=True
[2023-04-14 18:06:48,518] [INFO] [config.py:957:print] zero_enabled … True
[2023-04-14 18:06:48,518] [INFO] [config.py:957:print] zero_force_ds_cpu_optimizer … True
[2023-04-14 18:06:48,518] [INFO] [config.py:957:print] zero_optimization_stage … 2
[2023-04-14 18:06:48,518] [INFO] [config.py:943:print_user_config] json = {
“train_micro_batch_size_per_gpu”: 24,
“zero_allow_untested_optimizer”: true,
“fp16”: {
“enabled”: true,
“loss_scale”: 0,
“initial_scale_power”: 16,
“loss_scale_window”: 1000,
“hysteresis”: 2,
“min_loss_scale”: 1
},
“zero_optimization”: {
“stage”: 2,
“allgather_partitions”: true,
“allgather_bucket_size”: 5.000000e+08,
“overlap_comm”: false,
“reduce_scatter”: true,
“reduce_bucket_size”: 5.000000e+08,
“contiguous_gradients”: true
}
}
Using /home/guodong.li/.cache/torch_extensions/py310_cu117 as PyTorch extensions root…
No modifications detected for re-loaded extension module utils, skipping build step…
Loading extension module utils…
Time to load utils op: 0.00031948089599609375 seconds
0%| | 0/596 [00:00use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False…
[2023-04-14 18:06:53,718] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1
[2023-04-14 18:06:55,883] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768
0%|▎ | 1/596 [00:07<1:13:02, 7.37s/it][2023-04-14 18:06:57,948] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384
[2023-04-14 18:07:00,007] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192
0%|▌ | 2/596 [00:11<54:01, 5.46s/it][2023-04-14 18:07:06,332] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192, reducing to 4096
1%|▊ | 3/596 [00:17<57:51, 5.85s/it][2023-04-14 18:07:08,383] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, reducing to 2048
1%|█▏ | 4/596 [00:24<59:20, 6.01s/it][2023-04-14 18:07:18,876] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2048, reducing to 1024
[2023-04-14 18:07:18,876] [INFO] [logging.py:96:log_dist] [Rank 0] step=10, skipped=7, lr=[9.949664429530202e-05, 9.949664429530202e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
[2023-04-14 18:07:18,877] [INFO] [timer.py:199:stop] epoch=0/micro_step=10/global_step=10, RunningAvgSamplesPerSec=66.98818896434254, CurrSamplesPerSec=93.79590019766518, MemAllocated=21.59GB, MaxMemAllocated=28.8GB
1%|█▍ | 5/596 [00:30<1:00:11, 6.11s/it]
…
[2023-04-14 18:47:55,207] [INFO] [logging.py:96:log_dist] [Rank 0] step=590, skipped=12, lr=[3.02013422818792e-06, 3.02013422818792e-06], mom=[(0.9, 0.999), (0.9, 0.999)]
[2023-04-14 18:47:57,392] [INFO] [timer.py:199:stop] epoch=0/micro_step=590/global_step=590, RunningAvgSamplesPerSec=45.931193758598916, CurrSamplesPerSec=45.63412532914195, MemAllocated=21.59GB, MaxMemAllocated=28.8GB
50%|███████████████████████████████████████████████████████████████████████████████████▊ | 299/596 [41:42<41:37, 8.41s/it][2023-04-14 18:48:37,273] [INFO] [logging.py:96:log_dist] [Rank 0] step=600, skipped=12, lr=[1.3422818791946309e-06, 1.3422818791946309e-06], mom=[(0.9, 0.999), (0.9, 0.999)]
[2023-04-14 18:48:39,453] [INFO] [timer.py:199:stop] epoch=0/micro_step=600/global_step=600, RunningAvgSamplesPerSec=45.92850276413307, CurrSamplesPerSec=45.66031263997641, MemAllocated=21.59GB, MaxMemAllocated=28.8GB
{‘loss’: 13.3487, ‘learning_rate’: 1.3422818791946309e-06, ‘epoch’: 1.01}
50%|████████████████████████████████████████████████████████████████████████████████████ | 300/596 [41:50<41:30, 8.41s/it]Saving the whole model
[INFO|configuration_utils.py:457] 2023-04-14 18:48:39,458 >> Configuration saved in /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/config.json
[INFO|configuration_utils.py:362] 2023-04-14 18:48:39,459 >> Configuration saved in /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/generation_config.json
[INFO|modeling_utils.py:1855] 2023-04-14 18:49:03,951 >> The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/pytorch_model.bin.index.json.
[INFO|tokenization_utils_base.py:2171] 2023-04-14 18:49:03,953 >> tokenizer config file saved in /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/tokenizer_config.json
[INFO|tokenization_utils_base.py:2178] 2023-04-14 18:49:03,953 >> Special tokens file saved in /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/special_tokens_map.json
[2023-04-14 18:49:03,983] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step600 is about to be saved!
[2023-04-14 18:49:03,988] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/mp_rank_00_model_states.pt
[2023-04-14 18:49:03,988] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/mp_rank_00_model_states.pt…
[2023-04-14 18:49:15,934] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/mp_rank_00_model_states.pt.
[2023-04-14 18:49:15,937] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/zero_pp_rank_0_mp_rank_00_optim_states.pt…
[2023-04-14 18:49:28,049] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/zero_pp_rank_0_mp_rank_00_optim_states.pt.
[2023-04-14 18:49:28,049] [INFO] [engine.py:3125:_save_zero_checkpoint] zero checkpoint saved /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4/checkpoint-300/global_step600/zero_pp_rank_0_mp_rank_00_optim_states.pt
[2023-04-14 18:49:28,049] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step600 is ready now!
51%|████████████████████████████████████████████████████████████████████████████████████▏ | 304/596 [43:14<1:05:51, 13.53s/it][2023-04-14 18:50:09,137] [INFO] [logging.py:96:log_dist] [Rank 0] step=610, skipped=12, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2023-04-14 18:50:11,316] [INFO] [timer.py:199:stop] epoch=0/micro_step=610/global_step=610, RunningAvgSamplesPerSec=45.926876625767875, CurrSamplesPerSec=45.66709917655267, MemAllocated=21.59GB, MaxMemAllocated=28.8GB
52%|██████████████████████████████████████████████████████████████████████████████████████▌ | 309/596 [43:56<44:16, 9.26s/it][2023-04-14 18:50:51,114] [INFO] [logging.py:96:log_dist] [Rank 0] step=620, skipped=12, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2023-04-14 18:50:53,302] [INFO] [timer.py:199:stop] epoch=0/micro_step=620/global_step=620, RunningAvgSamplesPerSec=45.92462533252217, CurrSamplesPerSec=45.55552426651123, MemAllocated=21.59GB, MaxMemAllocated=28.8GB
{‘loss’: 13.3202, ‘learning_rate’: 0.0, ‘epoch’: 1.04}
…
99%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 589/596 [1:23:07<00:58, 8.41s/it][2023-04-14 19:30:02,654] [INFO] [logging.py:96:log_dist] [Rank 0] step=1180, skipped=12, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2023-04-14 19:30:04,820] [INFO] [timer.py:199:stop] epoch=0/micro_step=1180/global_step=1180, RunningAvgSamplesPerSec=45.85904109663022, CurrSamplesPerSec=45.73521852038509, MemAllocated=21.59GB, MaxMemAllocated=28.8GB
{‘loss’: 13.3537, ‘learning_rate’: 0.0, ‘epoch’: 1.98}
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍| 594/596 [1:23:49<00:16, 8.41s/it][2023-04-14 19:30:44,847] [INFO] [logging.py:96:log_dist] [Rank 0] step=1190, skipped=12, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)]
[2023-04-14 19:30:47,022] [INFO] [timer.py:199:stop] epoch=0/micro_step=1190/global_step=1190, RunningAvgSamplesPerSec=45.856487437478386, CurrSamplesPerSec=45.579988341622055, MemAllocated=21.59GB, MaxMemAllocated=28.8GB
{‘train_runtime’: 5046.8863, ‘train_samples_per_second’: 45.414, ‘train_steps_per_second’: 0.118, ‘train_loss’: 13.905431555421561, ‘epoch’: 2.0}
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 596/596 [1:24:06<00:00, 8.47s/it]
***** train metrics *****
epoch = 2.0
train_loss = 13.9054
train_runtime = 1:24:06.88
train_samples = 114599
train_samples_per_second = 45.414
train_steps_per_second = 0.118
[2023-04-14 19:30:58,560] [INFO] [launch.py:460:main] Process 35198 exits successfully.
[2023-04-14 19:30:58,561] [INFO] [launch.py:460:main] Process 35192 exits successfully.
[2023-04-14 19:30:58,561] [INFO] [launch.py:460:main] Process 35193 exits successfully.
[2023-04-14 19:30:58,561] [INFO] [launch.py:460:main] Process 35195 exits successfully.
[2023-04-14 19:30:58,561] [INFO] [launch.py:460:main] Process 35191 exits successfully.
[2023-04-14 19:30:59,562] [INFO] [launch.py:460:main] Process 35194 exits successfully.
[2023-04-14 19:30:59,563] [INFO] [launch.py:460:main] Process 35197 exits successfully.
[2023-04-14 19:31:00,564] [INFO] [launch.py:460:main] Process 35196 exits successfully.

GPU显存占用：

Fri Apr 14 18:27:45 2023

±----------------------------------------------------------------------------+

| NVIDIA-SMI 515.105.01   Driver Version: 515.105.01   CUDA Version: 11.7     |

|-------------------------------±---------------------±---------------------+

| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|                               |                      |               MIG M. |

|=++==============|

|   0  NVIDIA A800 80G…  Off  | 00000000:34:00.0 Off |                    0 |

| N/A   59C    P0    92W / 300W |  36539MiB / 81920MiB |    100%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+

|   1  NVIDIA A800 80G…  Off  | 00000000:35:00.0 Off |                    0 |

| N/A   61C    P0    96W / 300W |  38395MiB / 81920MiB |    100%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+

|   2  NVIDIA A800 80G…  Off  | 00000000:36:00.0 Off |                    0 |

| N/A   63C    P0    93W / 300W |  38395MiB / 81920MiB |    100%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+

|   3  NVIDIA A800 80G…  Off  | 00000000:37:00.0 Off |                    0 |

| N/A   65C    P0   102W / 300W |  38347MiB / 81920MiB |    100%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+

|   4  NVIDIA A800 80G…  Off  | 00000000:9B:00.0 Off |                    0 |

| N/A   64C    P0   108W / 300W |  38347MiB / 81920MiB |    100%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+

|   5  NVIDIA A800 80G…  Off  | 00000000:9C:00.0 Off |                    0 |

| N/A   64C    P0   105W / 300W |  38395MiB / 81920MiB |    100%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+

|   6  NVIDIA A800 80G…  Off  | 00000000:9D:00.0 Off |                    0 |

| N/A   58C    P0    97W / 300W |  36433MiB / 81920MiB |    100%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+

|   7  NVIDIA A800 80G…  Off  | 00000000:9E:00.0 Off |                    0 |

| N/A   59C    P0    92W / 300W |  38347MiB / 81920MiB |    100%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 35191 C …nv-py310-cu117/bin/python 36537MiB |
| 1 N/A N/A 35192 C …nv-py310-cu117/bin/python 38393MiB |
| 2 N/A N/A 35193 C …nv-py310-cu117/bin/python 38393MiB |
| 3 N/A N/A 35194 C …nv-py310-cu117/bin/python 38345MiB |
| 4 N/A N/A 35195 C …nv-py310-cu117/bin/python 38345MiB |
| 5 N/A N/A 35196 C …nv-py310-cu117/bin/python 38393MiB |
| 6 N/A N/A 35197 C …nv-py310-cu117/bin/python 36431MiB |
| 7 N/A N/A 35198 C …nv-py310-cu117/bin/python 38345MiB |
±----------------------------------------------------------------------------+

输出文件：

 tree /home/guodong.li/output/adgen-chatglm-6b-ft-1e-4

/home/guodong.li/output/adgen-chatglm-6b-ft-1e-4

├── all_results.json

├── checkpoint-300

│   ├── config.json

│   ├── configuration_chatglm.py

│   ├── generation_config.json

│   ├── global_step600

│   │   ├── mp_rank_00_model_states.pt

│   │   ├── zero_pp_rank_0_mp_rank_00_optim_states.pt

│   │   ├── zero_pp_rank_1_mp_rank_00_optim_states.pt

│   │   ├── zero_pp_rank_2_mp_rank_00_optim_states.pt

│   │   ├── zero_pp_rank_3_mp_rank_00_optim_states.pt

│   │   ├── zero_pp_rank_4_mp_rank_00_optim_states.pt

│   │   ├── zero_pp_rank_5_mp_rank_00_optim_states.pt

│   │   ├── zero_pp_rank_6_mp_rank_00_optim_states.pt

│   │   └── zero_pp_rank_7_mp_rank_00_optim_states.pt

│   ├── ice_text.model

│   ├── latest

│   ├── modeling_chatglm.py

│   ├── pytorch_model-00001-of-00002.bin

│   ├── pytorch_model-00002-of-00002.bin

│   ├── pytorch_model.bin.index.json

│   ├── quantization.py

│   ├── rng_state_0.pth

│   ├── rng_state_1.pth

│   ├── rng_state_2.pth

│   ├── rng_state_3.pth

│   ├── rng_state_4.pth

│   ├── rng_state_5.pth

│   ├── rng_state_6.pth

│   ├── rng_state_7.pth

│   ├── special_tokens_map.json

│   ├── tokenization_chatglm.py

│   ├── tokenizer_config.json

│   ├── trainer_state.json

│   ├── training_args.bin

│   └── zero_to_fp32.py

├── trainer_state.json

└── train_results.json

2 directories, 36 files

训练结束后没有保存模型权重，只保存了训练过程中的checkpoint，可在代码中添加trainer.save_model()进行保存。

使用DeepSpeed进行full finetuning，对于显存要求较高，且训练较慢。因此下面尝试使用官网提供的P-Tuning v2进行高效参数微调。

使用P-Tuning v2对ChatGLM-6B进行参数高效微调

对于 ChatGLM-6B 模型基于 P-Tuning v2 进行微调。可将需要微调的参数量减少到原来的 0.1%，再通过模型量化、Gradient Checkpoint 等方法，最低只需要 7GB 显存即可运行。

首先，修改train.sh脚本，主要是修改train_file、validation_file、model_name_or_path、output_dir参数：

PRE_SEQ_LEN=128

LR=2e-2

CUDA_VISIBLE_DEVICES=0 python3 main.py
–do_train
–train_file /data/nfs/llm/data/AdvertiseGen/train.json
–validation_file /data/nfs/llm/data/AdvertiseGen/dev.json
–prompt_column content
–response_column summary
–overwrite_cache
–model_name_or_path /data/nfs/llm/model/chatglm-6b
–output_dir /home/guodong.li/output/adgen-chatglm-6b-pt- $PRE_SEQ_LEN-$ LR
–overwrite_output_dir
–max_source_length 64
–max_target_length 64
–per_device_train_batch_size 1
–per_device_eval_batch_size 1
–gradient_accumulation_steps 16
–predict_with_generate
–max_steps 3000
–logging_steps 10
–save_steps 1000
–learning_rate $LR
–pre_seq_len $PRE_SEQ_LEN
–quantization_bit 4

运行过程：

  0%|                  | 0/3000 [00:00
…

{‘loss’: 4.2962, ‘learning_rate’: 0.0196, ‘epoch’: 0.01}

{‘loss’: 4.3112, ‘learning_rate’: 0.019533333333333333, ‘epoch’: 0.01}

2%|███▊             | 70/3000 [03:20<2:17:06,  2.81s/it]

GPU显存占用：

|-------------------------------±---------------------±---------------------+

| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|                               |                      |               MIG M. |

|=++==============|

|   0  NVIDIA A800 80G…  Off  | 00000000:34:00.0 Off |                    0 |

| N/A   71C    P0   300W / 300W |   6291MiB / 81920MiB |     74%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+

对显存的占用确实低，即使用了P-Tuning v2进行参数高效微调，但训练的速度还是很慢。

修改train.sh增大batch_size继续干。

PRE_SEQ_LEN=128

LR=2e-2

CUDA_VISIBLE_DEVICES=0 python3 main.py
–do_train
–train_file /data/nfs/llm/data/AdvertiseGen/train.json
–validation_file /data/nfs/llm/data/AdvertiseGen/dev.json
–prompt_column content
–response_column summary
–overwrite_cache
–model_name_or_path /data/nfs/llm/model/chatglm-6b
–output_dir /home/guodong.li/output/adgen-chatglm-6b-pt- $PRE_SEQ_LEN-$ LR
–overwrite_output_dir
–max_source_length 64
–max_target_length 64
–per_device_train_batch_size 128
–per_device_eval_batch_size 8
–gradient_accumulation_steps 16
–predict_with_generate
–num_train_epochs 1
–logging_steps 10
–save_steps 100
–learning_rate $LR
–pre_seq_len $PRE_SEQ_LEN
–quantization_bit 4

运行过程：

sh train.sh

04/14/2023 19:46:38 - WARNING - main - Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: Fals

04/14/2023 19:46:38 - INFO - main - Training/evaluation parameters Seq2SeqTrainingArguments(

_n_gpu=1,

adafactor=False,

adam_beta1=0.9,

adam_beta2=0.999,

adam_epsilon=1e-08,

auto_find_batch_size=False,

bf16=False,

bf16_full_eval=False,

data_seed=None,

dataloader_drop_last=False,

dataloader_num_workers=0,

dataloader_pin_memory=True,

ddp_bucket_cap_mb=None,

ddp_find_unused_parameters=None,

ddp_timeout=1800,

debug=[],

deepspeed=None,

disable_tqdm=False,

do_eval=False,

do_predict=False,

do_train=True,

eval_accumulation_steps=None,

eval_delay=0,

eval_steps=None,

evaluation_strategy=no,

fp16=False,

fp16_backend=auto,

fp16_full_eval=False,

fp16_opt_level=O1,

fsdp=[],

fsdp_config={‘fsdp_min_num_params’: 0, ‘xla’: False, ‘xla_fsdp_grad_ckpt’: False},

fsdp_min_num_params=0,

fsdp_transformer_layer_cls_to_wrap=None,

full_determinism=False,

generation_config=None,

generation_max_length=None,

generation_num_beams=None,

gradient_accumulation_steps=16,

gradient_checkpointing=False,

greater_is_better=None,

group_by_length=False,

half_precision_backend=auto,

hub_model_id=None,

hub_private_repo=False,

hub_strategy=every_save,

hub_token=,

ignore_data_skip=False,

include_inputs_for_metrics=False,

jit_mode_eval=False,

label_names=None,

label_smoothing_factor=0.0,

learning_rate=0.02,

length_column_name=length,

load_best_model_at_end=False,

local_rank=-1,

log_level=passive,

log_level_replica=warning,

log_on_each_node=True,

logging_dir=/home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2/runs/Apr14_19-46-38_ai-app-2-46,

logging_first_step=False,

logging_nan_inf_filter=True,

logging_steps=10,

logging_strategy=steps,

lr_scheduler_type=linear,

max_grad_norm=1.0,

max_steps=-1,

metric_for_best_model=None,

mp_parameters=,

no_cuda=False,

num_train_epochs=1.0,

optim=adamw_hf,

optim_args=None,

output_dir=/home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2,

overwrite_output_dir=True,

past_index=-1,

per_device_eval_batch_size=8,

per_device_train_batch_size=128,

predict_with_generate=True,

prediction_loss_only=False,

push_to_hub=False,

push_to_hub_model_id=None,

push_to_hub_organization=None,

push_to_hub_token=,

ray_scope=last,

remove_unused_columns=True,

report_to=[],

resume_from_checkpoint=None,

run_name=/home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2,

save_on_each_node=False,

save_safetensors=False,

save_steps=100,

save_strategy=steps,

save_total_limit=None,

seed=42,

sharded_ddp=[],

skip_memory_metrics=True,

sortish_sampler=False,

tf32=None,

torch_compile=False,

torch_compile_backend=None,

torch_compile_mode=None,

torchdynamo=None,

tpu_metrics_debug=False,

tpu_num_cores=None,

use_ipex=False,

use_legacy_prediction_loop=False,

use_mps_device=False,

warmup_ratio=0.0,

warmup_steps=0,

weight_decay=0.0,

xpu_backend=None,

)

04/14/2023 19:47:58 - WARNING - datasets.builder - Found cached dataset json (/home/guodong.li/.cache/huggingface/datasets/json/default-1cf934bed8e233e6e)

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████

[INFO|configuration_utils.py:666] 2023-04-14 19:47:58,671 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json

[WARNING|configuration_auto.py:925] 2023-04-14 19:47:58,671 >> Explicitly passing a revision is encouraged when loading a configuratio a newer revision.

[INFO|configuration_utils.py:666] 2023-04-14 19:47:58,679 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json

[INFO|configuration_utils.py:720] 2023-04-14 19:47:58,681 >> Model config ChatGLMConfig {

“_name_or_path”: “/data/nfs/llm/model/chatglm-6b”,

“architectures”: [

“ChatGLMModel”

],

“auto_map”: {

“AutoConfig”: “configuration_chatglm.ChatGLMConfig”,

“AutoModel”: “modeling_chatglm.ChatGLMForConditionalGeneration”,

“AutoModelForSeq2SeqLM”: “modeling_chatglm.ChatGLMForConditionalGeneration”

},

“bos_token_id”: 130004,

“eos_token_id”: 130005,

“gmask_token_id”: 130001,

“hidden_size”: 4096,

“inner_hidden_size”: 16384,

“layernorm_epsilon”: 1e-05,

“mask_token_id”: 130000,

“max_sequence_length”: 2048,

“model_type”: “chatglm”,

“num_attention_heads”: 32,

“num_layers”: 28,

“pad_token_id”: 3,

“position_encoding_2d”: true,

“pre_seq_len”: null,

“prefix_projection”: false,

“quantization_bit”: 0,

“torch_dtype”: “float16”,

“transformers_version”: “4.28.0”,

“use_cache”: true,

“vocab_size”: 130528

}

[WARNING|tokenization_auto.py:675] 2023-04-14 19:47:58,683 >> Explicitly passing a revision is encouraged when loading a model with curevision.
[INFO|tokenization_utils_base.py:1807] 2023-04-14 19:47:58,692 >> loading file ice_text.model
[INFO|tokenization_utils_base.py:1807] 2023-04-14 19:47:58,692 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:1807] 2023-04-14 19:47:58,692 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:1807] 2023-04-14 19:47:58,692 >> loading file tokenizer_config.json
[WARNING|auto_factory.py:456] 2023-04-14 19:47:59,089 >> Explicitly passing a revision is encouraged when loading a model with custom ion.
[INFO|modeling_utils.py:2531] 2023-04-14 19:47:59,115 >> loading weights file /data/nfs/llm/model/chatglm-6b/pytorch_model.bin.index.jso
[INFO|configuration_utils.py:575] 2023-04-14 19:47:59,117 >> Generate config GenerationConfig {
“_from_model_config”: true,
“bos_token_id”: 130004,
“eos_token_id”: 130005,
“pad_token_id”: 3,
“transformers_version”: “4.28.0”
}

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████
[INFO|modeling_utils.py:3190] 2023-04-14 19:48:08,508 >> All model checkpoint weights were used when initializing ChatGLMForConditionalG

[WARNING|modeling_utils.py:3192] 2023-04-14 19:48:08,508 >> Some weights of ChatGLMForConditionalGeneration were not initialized from thtialized: [‘transformer.prefix_encoder.embedding.weight’]
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[INFO|modeling_utils.py:2839] 2023-04-14 19:48:08,548 >> Generation config file not found, using a generation config created from the mo
Quantized to 4 bit
input_ids [5, 65421, 61, 67329, 32, 98339, 61, 72043, 32, 65347, 61, 70872, 32, 69768, 61, 68944, 32, 67329, 64103, 61, 96914, 130001, 15388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65564219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 6 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
inputs 类型#裤版型#宽松风格#性感图案#线条裤型#阔腿裤宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长适贴身体验感棒棒哒。系带部分增加设计看点,还
label_ids [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100,65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 741-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100
labels 宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长2米的效果宽松的裤腿,当然是遮肉小能手啊。上身随性自
/home/guodong.li/virtual-venv/chatglm-ptuningv2-venv-py310-cu117/lib/python3.10/site-packages/transformers/optimization.py:391: FutureWain a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warn
warnings.warn(
0%| 04/14/2023 19:51:19 - WARNING - transformers_modules.chatglm-6b.modeling_chatglm - use_cache=True is incompatible with gradient checkp
{‘loss’: 6.0246, ‘learning_rate’: 0.016428571428571428, ‘epoch’: 0.18}
{‘loss’: 7.8721, ‘learning_rate’: 0.012857142857142859, ‘epoch’: 0.36}
{‘loss’: 8.2653, ‘learning_rate’: 0.009285714285714286, ‘epoch’: 0.54}
{‘loss’: 8.6636, ‘learning_rate’: 0.005714285714285714, ‘epoch’: 0.71}
{‘loss’: 8.5985, ‘learning_rate’: 0.002142857142857143, ‘epoch’: 0.89}
{‘train_runtime’: 4868.4062, ‘train_samples_per_second’: 23.539, ‘train_steps_per_second’: 0.012, ‘train_loss’: 7.956800188337054, 'epoc
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████
***** train metrics *****
epoch = 1.0
train_loss = 7.9568
train_runtime = 1:21:08.40
train_samples = 114599
train_samples_per_second = 23.539
train_steps_per_second = 0.012

显存占用：

Sun Apr 16 19:53:00 2023

±----------------------------------------------------------------------------+

| NVIDIA-SMI 515.105.01   Driver Version: 515.105.01   CUDA Version: 11.7     |

|-------------------------------±---------------------±---------------------+

| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|                               |                      |               MIG M. |

|=++==============|

|   0  NVIDIA A800 80G…  Off  | 00000000:34:00.0 Off |                    0 |

| N/A   71C    P0   281W / 300W |  63275MiB / 81920MiB |     92%      Default |

|                               |                      |             Disabled |

±------------------------------±---------------------±---------------------+

输出文件：

> ls -al  /home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2

total 12

drwxrwxr-x 2 guodong.li guodong.li   98 Apr 14 21:12 .

drwxrwxr-x 8 guodong.li guodong.li  177 Apr 14 17:12 …

-rw-rw-r-- 1 guodong.li guodong.li  195 Apr 14 21:12 all_results.json

-rw-rw-r-- 1 guodong.li guodong.li 1185 Apr 14 21:12 trainer_state.json

-rw-rw-r-- 1 guodong.li guodong.li  195 Apr 14 21:12 train_results.json

可以看到，通过调整batch_size，显存使用及利用率都提升上去了。

如果需要使用DeepSpeed进行数据并行，可参考如下命令：

PRE_SEQ_LEN=128

LR=2e-2

deepspeed --include localhost:1,2,3 --master_port 29001 main.py
–deepspeed deepspeed.json
–do_train
–train_file /data/nfs/llm/data/AdvertiseGen/train.json
–validation_file /data/nfs/llm/data/AdvertiseGen/dev.json
–prompt_column content
–response_column summary
–overwrite_cache
–model_name_or_path /data/nfs/llm/model/chatglm-6b
–output_dir /home/guodong.li/output/adgen-chatglm-6b-pt
–overwrite_output_dir
–max_source_length 64
–max_target_length 64
–per_device_train_batch_size 128
–per_device_eval_batch_size 8
–gradient_accumulation_steps 16
–predict_with_generate
–num_train_epochs 10
–logging_steps 10
–save_steps 100
–learning_rate $LR
–pre_seq_len $PRE_SEQ_LEN

模型评估

修改evaluate.sh文件，修改model_name_or_path（模型路径），ptuning_checkpoint（P-Tuning v2微调之后的权重路径）等参数：

PRE_SEQ_LEN=128

CHECKPOINT=adgen-chatglm-6b-pt-128-2e-2

STEP=3000

PRE_SEQ_LEN=128
CHECKPOINT=adgen-chatglm-6b-pt-128-2e-2
STEP=3000

CUDA_VISIBLE_DEVICES=1 python3 main.py
–do_predict
–validation_file /data/nfs/llm/data/AdvertiseGen/dev.json
–test_file /data/nfs/llm/data/AdvertiseGen/dev.json
–overwrite_cache
–prompt_column content
–response_column summary
–model_name_or_path /data/nfs/llm/model/chatglm-6b
–ptuning_checkpoint /home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2/checkpoint-500
–output_dir /home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2/checkpoint-500
–overwrite_output_dir
–max_source_length 64
–max_target_length 64
–per_device_eval_batch_size 1
–predict_with_generate
–pre_seq_len $PRE_SEQ_LEN
–quantization_bit 4

运行过程：

sh evaluate.sh

04/16/2023 20:18:01 - WARNING - main - Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False

04/16/2023 20:18:01 - INFO - main - Training/evaluation parameters Seq2SeqTrainingArguments(

_n_gpu=1,

adafactor=False,

adam_beta1=0.9,

adam_beta2=0.999,

adam_epsilon=1e-08,

auto_find_batch_size=False,

…

fp16=False,

fp16_backend=auto,

fp16_full_eval=False,

fp16_opt_level=O1,

fsdp=[],

fsdp_config={‘fsdp_min_num_params’: 0, ‘xla’: False, ‘xla_fsdp_grad_ckpt’: False},

fsdp_min_num_params=0,

fsdp_transformer_layer_cls_to_wrap=None,

full_determinism=False,

generation_config=None,

…

warmup_ratio=0.0,

warmup_steps=0,

weight_decay=0.0,

xpu_backend=None,

)

Downloading and preparing dataset json/default to /home/guodong.li/.cache/huggingface/datasets/json/default-df42438b5ccb0b44/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e…

Downloading data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 3419.73it/s]

Extracting data files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 196.48it/s]

Dataset json downloaded and prepared to /home/guodong.li/.cache/huggingface/datasets/json/default-df42438b5ccb0b44/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e. Subsequent calls will reuse this data.

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 326.85it/s]

[INFO|configuration_utils.py:666] 2023-04-16 20:19:21,784 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json

[WARNING|configuration_auto.py:925] 2023-04-16 20:19:21,785 >> Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.

[INFO|configuration_utils.py:666] 2023-04-16 20:19:21,792 >> loading configuration file /data/nfs/llm/model/chatglm-6b/config.json

[INFO|configuration_utils.py:720] 2023-04-16 20:19:21,795 >> Model config ChatGLMConfig {

“_name_or_path”: “/data/nfs/llm/model/chatglm-6b”,

“architectures”: [

“ChatGLMModel”

],

“auto_map”: {

“AutoConfig”: “configuration_chatglm.ChatGLMConfig”,

“AutoModel”: “modeling_chatglm.ChatGLMForConditionalGeneration”,

“AutoModelForSeq2SeqLM”: “modeling_chatglm.ChatGLMForConditionalGeneration”

},

“bos_token_id”: 130004,

“eos_token_id”: 130005,

“gmask_token_id”: 130001,

“hidden_size”: 4096,

“inner_hidden_size”: 16384,

“layernorm_epsilon”: 1e-05,

“mask_token_id”: 130000,

“max_sequence_length”: 2048,

“model_type”: “chatglm”,

“num_attention_heads”: 32,

“num_layers”: 28,

“pad_token_id”: 3,

“position_encoding_2d”: true,

“pre_seq_len”: null,

“prefix_projection”: false,

“quantization_bit”: 0,

“torch_dtype”: “float16”,

“transformers_version”: “4.28.0”,

“use_cache”: true,

“vocab_size”: 130528

}

[WARNING|tokenization_auto.py:675] 2023-04-16 20:19:21,797 >> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
[INFO|tokenization_utils_base.py:1807] 2023-04-16 20:19:21,805 >> loading file ice_text.model
[INFO|tokenization_utils_base.py:1807] 2023-04-16 20:19:21,805 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:1807] 2023-04-16 20:19:21,805 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:1807] 2023-04-16 20:19:21,805 >> loading file tokenizer_config.json
[WARNING|auto_factory.py:456] 2023-04-16 20:19:22,186 >> Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
[INFO|modeling_utils.py:2531] 2023-04-16 20:19:22,222 >> loading weights file /data/nfs/llm/model/chatglm-6b/pytorch_model.bin.index.json
[INFO|configuration_utils.py:575] 2023-04-16 20:19:22,224 >> Generate config GenerationConfig {
“_from_model_config”: true,
“bos_token_id”: 130004,
“eos_token_id”: 130005,
“pad_token_id”: 3,
“transformers_version”: “4.28.0”
}

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:08<00:00, 1.04s/it]
[INFO|modeling_utils.py:3190] 2023-04-16 20:19:30,912 >> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration.

[WARNING|modeling_utils.py:3192] 2023-04-16 20:19:30,912 >> Some weights of ChatGLMForConditionalGeneration were not initialized from the model checkpoint at /data/nfs/llm/model/chatglm-6b and are newly initialized: [‘transformer.prefix_encoder.embedding.weight’]
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[INFO|modeling_utils.py:2839] 2023-04-16 20:19:30,967 >> Generation config file not found, using a generation config created from the model config.
Quantized to 4 bit
input_ids [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 5, 65421, 61, 75898, 32, 68554, 61, 77257, 64555, 32, 65107, 61, 66268, 32, 65347, 61, 71689, 32, 69768, 61, 85428, 32, 65173, 73942, 61, 70984, 32, 65173, 70936, 61, 64703, 65509, 130001, 130004]
inputs 类型#上衣材质#牛仔布颜色#白色风格#简约图案#刺绣衣样式#外套衣款式#破洞
label_ids [5, 71689, 66561, 67061, 77257, 70984, 6, 72194, 65173, 64290, 64622, 81549, 63823, 65173, 64290, 83343, 63832, 63912, 65209, 64703, 65509, 64051, 6, 69418, 78598, 87019, 6, 64257, 71319, 66069, 74197, 63823, 65173, 72265, 64880, 64131, 63832, 73416, 85428, 66261, 6, 65594, 87834, 6, 73412, 105145, 65388, 63823, 130001, 130004]
labels 简约而不简单的牛仔外套,白色的衣身十分百搭。衣身多处有做旧破洞设计,打破单调乏味,增加一丝造型看点。衣身后背处有趣味刺绣装饰,丰富层次感,彰显别样时尚。
04/16/2023 20:21:30 - INFO - main - *** Predict ***
[INFO|configuration_utils.py:575] 2023-04-16 20:21:30,090 >> Generate config GenerationConfig {
“_from_model_config”: true,
“bos_token_id”: 130004,
“eos_token_id”: 130005,
“pad_token_id”: 3,
“transformers_version”: “4.28.0”
}

0%| | 0/1070 [00:00> Generate config GenerationConfig {
“_from_model_config”: true,
“bos_token_id”: 130004,
“eos_token_id”: 130005,
“pad_token_id”: 3,
“transformers_version”: “4.28.0”
}

0%|▎ | 2/1070 [00:02<25:39, 1.44s/it][INFO|configuration_utils.py:575] 2023-04-16 20:21:37,311 >> Generate config GenerationConfig {
“_from_model_config”: true,
“bos_token_id”: 130004,
“eos_token_id”: 130005,
“pad_token_id”: 3,
“transformers_version”: “4.28.0”
}

0%|▍ | 3/1070
…
1%|█▎ | 8/1070 [00:20<50:13, 2.84s/it][INFO|configuration_utils.py:575] 2023-04-16 20:21:55,233 >> Generate config GenerationConfig {
“_from_model_config”: true,
“bos_token_id”: 130004,
“eos_token_id”: 130005,
“pad_token_id”: 3,
“transformers_version”: “4.28.0”
}

1%|█▍ | 9/1070 [00:23<50:24, 2.85s/it][INFO|configuration_utils.py:575] 2023-04-16 20:21:58,112 >> Generate config GenerationConfig {
“_from_model_config”: true,
“bos_token_id”: 130004,
“eos_token_id”: 130005,
“pad_token_id”: 3,
“transformers_version”: “4.28.0”
}

1%|█▌ | 10/1070 [00:26<50:30, 2.86s/it][INFO|configuration_utils.py:575] 2023-04-16 20:22:00,990 >> Generate config GenerationConfig {
“_from_model_config”: true,
“bos_token_id”: 130004,
“eos_token_id”: 130005,
“pad_token_id”: 3,
“transformers_version”: “4.28.0”
}

1%|█▋ | 11/1070 [00:29<50:37, 2.87s/it][INFO|configuration_utils.py:575] 2023-04-16 20:22:03,880 >> Generate config GenerationConfig {
“_from_model_config”: true,
“bos_token_id”: 130004,
“eos_token_id”: 130005,
“pad_token_id”: 3,
“transformers_version”: “4.28.0”
}

1%|█▊ | 12/1070 [00:32<50:38, 2.87s/it][INFO|configuration_utils.py:575] 2023-04-16 20:22:06,761 >> Generate config GenerationConfig {
“_from_model_config”: true,
“bos_token_id”: 130004,
“eos_token_id”: 130005,
“pad_token_id”: 3,
“transformers_version”: “4.28.0”
}
…
[INFO|configuration_utils.py:575] 2023-04-16 21:13:16,240 >> Generate config GenerationConfig {
“_from_model_config”: true,
“bos_token_id”: 130004,
“eos_token_id”: 130005,
“pad_token_id”: 3,
“transformers_version”: “4.28.0”
}

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 1069/1070 [51:44<00:02, 2.92s/it][INFO|configuration_utils.py:575] 2023-04-16 21:13:19,107 >> Generate config GenerationConfig {
“_from_model_config”: true,
“bos_token_id”: 130004,
“eos_token_id”: 130005,
“pad_token_id”: 3,
“transformers_version”: “4.28.0”
}

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1070/1070 [51:47<00:00, 2.90s/it]Building prefix dict from the default dictionary …
04/16/2023 21:13:22 - DEBUG - jieba - Building prefix dict from the default dictionary …
Dumping model to file cache /tmp/jieba.cache
04/16/2023 21:13:22 - DEBUG - jieba - Dumping model to file cache /tmp/jieba.cache
Loading model cost 0.634 seconds.
04/16/2023 21:13:22 - DEBUG - jieba - Loading model cost 0.634 seconds.
Prefix dict has been built successfully.
04/16/2023 21:13:22 - DEBUG - jieba - Prefix dict has been built successfully.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1070/1070 [51:53<00:00, 2.91s/it]
***** predict metrics *****
predict_bleu-4 = 0.7846
predict_rouge-1 = 8.8941
predict_rouge-2 = 1.3703
predict_rouge-l = 16.4982
predict_runtime = 0:51:57.77
predict_samples = 1070
predict_samples_per_second = 0.343
predict_steps_per_second = 0.343

模型推理

新增inference.py文件：

import os

import torch

from transformers import AutoConfig, AutoModel, AutoTokenizer

MODEL_PATH = “/data/nfs/llm/model/chatglm-6b”
CHECKPOINT_PATH = “/home/guodong.li/output/adgen-chatglm-6b-pt-128-2e-2/checkpoint-500”

载入Tokenizer

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)

config = AutoConfig.from_pretrained(MODEL_PATH, trust_remote_code=True, pre_seq_len=128)
model = AutoModel.from_pretrained(MODEL_PATH, config=config, trust_remote_code=True).cuda()

prefix_state_dict = torch.load(os.path.join(CHECKPOINT_PATH, “pytorch_model.bin”))
new_prefix_state_dict = {}

for k, v in prefix_state_dict.items():
if k.startswith(“transformer.prefix_encoder.”):
new_prefix_state_dict[k[len(“transformer.prefix_encoder.”):]] = v
model.transformer.prefix_encoder.load_state_dict(new_prefix_state_dict)

print(f"Quantized to 4 bit")
model = model.quantize(4)
model = model.half().cuda()
model.transformer.prefix_encoder.float()
model = model.eval()

print(“用户：你好\n”)
response, history = model.chat(tokenizer, “你好”, history=[])
print(“ChatGLM-6B：\n”,response)
print(“\n------------------------------------------------\n用户：”)

line = input()
while line:
response, history = model.chat(tokenizer, line, history=history)
print(“ChatGLM-6B：\n”, response)
print(“\n------------------------------------------------\n用户：”)
line = input()

运行命令：

CUDA_VISIBLE_DEVICES=0 python3 inference.py

结语

上面使用了DeepSpeed DP+ZeRO对ChatGLM-6B进行全参数微调，同时，当我们遇到GPU资源不足的情况下，可以利用P-Tuning v2进行了高效参数微调。

参考文档：

P-Tuning v2

你可能感兴趣的:(大模型,大模型)

数字人多模态交互中的语义理解技术：让虚拟角色真正“理解”用户 CarlowZJ 数字人 python
目录前言一、语义理解技术的概念（一）语义理解的定义（二）语义理解的关键技术二、语义理解的代码示例（一）安装依赖（二）语义理解模型（三）结合情感分析（四）完整的多模态语义理解系统三、应用场景（一）虚拟客服（二）教育辅导（三）虚拟直播（四）智能助手四、注意事项（一）上下文管理（二）情感分析（三）多模态融合（四）模型选择（五）性能优化（六）安全性和隐私保护五、总结前言在数字人多模态交互中，语义理解是实现
当语言模型”思考”时，它真的在推理吗？ qq_502428990 语言模型人工智能自然语言处理
最近，每当我看到ChatGPT一步步”推导”数学题，或是Claude条理分明地分析哲学问题时，总忍不住想起图灵测试那个古老的命题：我们是否又一次被表象迷惑了？这些看似严谨的推理过程，到底是一场精妙的模仿秀，还是真正智能的曙光？1.被误解的”思考者”走进任何科技论坛，你都能看到人们对GPT-4解题过程的惊叹：”看这一步一步的推导，它简直像人类一样在思考！”但作为一个长期观察语言模型的研究者，我不得不
鸿蒙HarmonyOS EventHub模块 yj235532 鸿蒙harmonyos 鸿蒙开发移动开发 harmonyos 鸿蒙鸿蒙开发 UI 组件化
一、引言EventHub模块在HarmonyOS应用开发中扮演着重要角色，它提供了事件中心的功能，包括事件订阅、取消订阅以及触发事件等操作，为应用内的模块间通信提供了有效的机制。二、模块基本信息接口支持版本：首批接口从APIversion9开始支持，后续版本的新增接口采用上角标单独标记起始版本。适用模型：本模块接口仅可在Stage模型下使用。三、导入模块在使用EventHub功能前，需导入相关模块
Android笔记（十五）ContentProvider源码浅析 jametang25 andorid
ContentProvider作为四大组件之一，由于业务上用到的地方不多,目前业务是系统界面，属于系统应用，最适合使用ContentProvider来进行少量数据存储，我们业务中涉及到的Settings.system和Settings.Secure等数据库，就是通过ContentProvider来封装、用ContentResolver来访问的//通过ContentResolver来访问Settin
学习三维动画心得 2501_92205961 开发语言青少年编程
在大二学年的三维动画设计学习进程中，我围绕3dsMax和Blender两大核心软件展开深入钻研，并在此基础上探索技术应用与创新。不仅熟练掌握了基础操作，还深入到代码编写与复杂技术问题解决领域，逐步构建起系统的三维动画设计知识与技能体系，以下是详细的学习总结。一、3dsMax的深度学习与技术实践（一）高级建模与脚本优化在3dsMax的学习中，基础建模掌握后，我开始挑战高级建模技术。利用NURBS建模
半导体FAB中的服务器硬件故障监控与预防全方案：从预警到零宕机实战爱吃青菜的大力水手服务器运维半导体 FAB运维 IT运维
服务器硬件故障监控与预防全方案：从预警到零宕机实战关键词：SMART监控RAID预警IPMI传感器性能基线PrometheusZabbix高可用架构一、硬件故障前的7大预警信号（附关联工具）故障类型关键指标监控工具预警阈值磁盘故障Reallocated_Sector_Countsmartctl+smartd>0立即告警Current_Pending_SectorPrometheus+NodeExp
AI对话导出工具 (AI Chat Exporter)——支持 ChatGPT, Grok 和 Gemini 平台 ALGORITHM LOL 人工智能 chatgpt
AI对话导出工具(AIChatExporter)轻松将AI对话导出为标准Markdown格式支持ChatGPT,Grok和Gemini平台相关代码已开源至Github欢迎Star✨功能特点多平台支持：同时支持ChatGPT,Grok和Gemini三大AI平台完整内容保留：精确导出所有对话内容，包括代码块、数学公式、链接和格式化文本标准Markdown格式：输出符合标准的Markdown格式，确保最
大模型本地部署，拥有属于自己的ChatGpt 小妖同学学AI chatgpt
ChatGpt以其强大的信息整合和对话能力惊艳了全球，在自然语言处理上面表现出了惊人的能力。不管用于文案撰写还是程序辅助开发都大大提高了我们的工作效率，但是其使用有一定的门槛，让我们大多数人都望而却步，今天我们利用ollama实现本地大模型的步骤，让我们轻松拥有自己的人工智能。Ollama作为一个轻量级的工具，可以帮助用户在本地运行这些大型语言模型，无需持续依赖云服务，既保护了数据隐私，又能减少网
程序和进程和线程的区别是什么？小白之歌 Java
程序和进程和线程的区别是什么？进程是操作系统资源分配的基本单位，线程是任务调度执行基本单位（CPU的基本调度单位）,程序是静态的指令集合，而进程是运行中的指令集合。进程：程序的一次执行，答法1：进程间切换代价大，线程间切换代价小进程拥有资源多，线程拥有资源少多个线程共享进程的资源进程是分配资源的基本单位，而线程是独立运行和调度的基本单位。任意时刻，一个CPU只能运行一个进程，进程获得资源后进行分配
事件循环（Event Loop）机制对比：Node.js vs 浏览器
1.共同点：基本事件循环模型两者都基于"任务队列+循环处理"的机制：主线程执行同步代码。异步任务（如I/O、定时器）完成后，回调函数被放入任务队列。事件循环不断检查队列，按规则取出任务执行。2.核心区别（1）任务队列类型不同浏览器Node.js宏任务（Macrotask）：•script（整体代码）•setTimeout/setInterval•DOM事件回调（如点击）•requestAnimat
技术开发全流程管理：涵盖天线系统的仿真建模（HFSS/CST等）、原型设计、调试优化（网络分析仪/暗室测试）到量产导入（LDS工艺识别），需主导技术文档编写（设计规范/测试报告）。百态老人网络设计规范
以下是针对天线系统技术开发全流程管理的完整解析，涵盖仿真建模、原型设计、调试优化、量产导入及技术文档编写五大环节，结合行业实践与资料核心信息进行系统阐述：一、仿真建模（HFSS/CST）1.软件选择与算法差异HFSS：基于有限元法（FEM），擅长电小尺寸、窄带天线设计（如微带天线、滤波电路），可精确计算辐射方向图、增益、S参数等。其自适应网格技术确保高精度，但计算资源消耗大，不适于电大尺寸模型。C
使用 Xinference 命令行工具（xinference launch）部署 Nanonets-OCR-s 没刮胡子 Linux服务器技术人工智能AI 软件开发技术实战专栏 ocr
使用Xinference命令行工具（xinferencelaunch）部署Nanonets-OCR-s一、核心优势与适用场景通过xinferencelaunch命令可直接在命令行完成模型部署，无需编写Python代码，适合快速验证或生产环境批量部署。二、部署步骤：从命令行启动模型1.确认环境与依赖已安装Xinference：pipinstall"xinference[all]"GPU显存≥9GB（
Spring AI 结合 MCP MySQL 实现对话式数据库查询没刮胡子软件开发技术实战专栏人工智能AI Spring 数据库 spring 人工智能 spring-ai mcp-server mysql
在现代应用开发中，将人工智能与数据库查询结合可以创造更自然、更智能的用户交互方式。下面我将详细介绍如何使用SpringAI框架结合MCP（可能指MySQL连接池或相关组件）实现对话中的数据库查询功能。什么是SpringAI和MCPMySQLSpringAI框架概述SpringAI是基于Spring生态的人工智能集成框架，它提供了：与大型语言模型(LLM)的集成能力对话管理和自然语言处理功能业务逻辑
AingDesk开源免费的本地 AI 模型管理工具(搭建和调用MCP) 没刮胡子 Linux服务器技术软件开发技术实战专栏人工智能AI 开源人工智能 AI助手 mcp sse 知识库智能体
说明AingDesk是一款开源免费的本地AI模型管理工具，旨在简化AI模型部署流程并提升用户体验。AingDesk支持本地AI模型及API+知识库搭建。支持知识库、模型API、分享、联网搜索、智能体。✨产品亮点跨平台支持客户端支持Windows、macOS，服务端可通过Docker部署高效下载与网络优化自动选择最优下载线路，支持断点续传，提升大模型部署速度兼容OpenAIAPI格式，方便第三方模型
MiniMax - M1：开源大模型的革命性突破
开源大模型MiniMax-M1研究报告一、引言在人工智能技术飞速发展的当下，大模型领域的竞争愈发激烈。开源大模型以其开放性、可定制性和社区协作的优势，逐渐成为推动人工智能技术进步的重要力量。MiniMax-M1作为全球首个开源大规模混合架构的推理模型，一经发布便引起了广泛关注。它在长上下文处理、推理效率和成本控制等方面展现出了卓越的性能，为人工智能的发展带来了新的思路和方向。本文将对MiniMax
ss928v100模型的导出、量化和转换 yunken28 python 开发语言
1、yolov8导出为onnxfromultralyticsimportYOLOmodel=YOLO("./best.pt")model.export(format="onnx",imgsz=640,dynamic=False,simplify=True,opset=11,batch=1,half=False)以下是model.export()方法各参数的详细解释：‌format="onnx"‌指
深度学习使用Pytorch训练模型步骤 vvvdg 深度学习 pytorch 人工智能
训练模型是机器学习和深度学习中的核心过程，旨在通过大量数据学习模型参数，以便模型能够对新的、未见过的数据做出准确的预测。训练模型通常包括以下几个步骤：1.数据准备：收集和处理数据，包括清洗、标准化和归一化。将数据分为训练集、验证集和测试集。2.定义模型：选择模型架构，例如决策树、神经网络等。初始化模型参数（权重和偏置）。3.选择损失函数：根据任务类型（如分类、回归）选择合适的损失函数。4.选择优化
常见的强化学习算法分类及其特点 ywfwyht 人工智能算法分类人工智能
强化学习（ReinforcementLearning,RL）是一种机器学习方法，通过智能体（Agent）与环境（Environment）的交互来学习如何采取行动以最大化累积奖励。以下是一些常见的强化学习算法分类及其特点：1.基于值函数的算法这些算法通过估计状态或状态-动作对的价值来指导决策。Q-Learning无模型的离线学习算法。通过更新Q值表来学习最优策略。更新公式：Q(s,a)←Q(s,a)
Scikit-learn：机器学习的「万能工具箱」科技林总 DeepSeek学AI 人工智能
——三行代码构建AI模型的全栈指南**###**一、诞生背景：让机器学习从实验室走向大众****2010年前的AI困境**：-学术界模型难以工程化-算法实现碎片化（MATLAB/C++主导）-企业应用门槛极高>**破局者**：DavidCournapeau发起*Scikit-learn*项目，**统一算法接口**+**Python简易语法**=机器学习民主化革命---###**二、设计哲学：一致性
Edge-TTS在广电系统中的语音合成技术的创新应用
Edge-TTS在广电系统中的语音合成技术的创新应用作者：本人是一名县级融媒体中心的工程师，多年来一直坚持学习、提升自己。喜欢Python编程、人工智能、网络安全等多领域的技术。摘要随着人工智能技术的快速发展，文字转语音(Text-to-Speech,TTS)系统已成为多种应用的重要组成部分，尤其在广播电视领域。本文介绍了一种基于Edge-TTS大模型的文字转语音工具，该工具结合了现代文本处理和语
Serverless架构下的持续交付实践软件工程实践软件工程最佳实践 AI软件构建大数据系统架构 serverless 架构运维 ai
Serverless架构下的持续交付实践关键词：Serverless架构、持续交付、DevOps、无服务器计算、自动化部署摘要：本文深入探讨了Serverless架构下的持续交付实践。首先介绍了Serverless架构和持续交付的背景知识，接着解释了相关核心概念及其关系，详细阐述了核心算法原理与操作步骤，通过数学模型加深理解，结合实际项目案例展示了代码实现与解读，探讨了实际应用场景，推荐了相关工具
Aop和Ioc有什么关系？（面试简洁版）乞讨不是罪过面试 java 职场和发展
AOP（面向切面编程）和IoC（控制反转）是Spring框架的两大核心，它们既独立又协作，共同实现松耦合、可扩展的架构设计。以下是它们的核心关系基础关系1.IoC是基石：Spring通过IoC容器（如ApplicationContext）统一管理所有Bean（包括普通业务Bean和AOP代理对象）。没有IoC，AOP无法自动生效。2.AOP是增强：AOP基于IoC管理的Bean，通过动态代理（JD
基于 Kintex UltraScale 系列 2 路 QSFP+40G 光纤 PCIe 数据传输卡 / 光纤适配器（5GByte/s 带宽KU060光纤 PCIe 数据传输卡） F_white 数据中心视频与图像采集处理机器视觉
PCIE732是一款基于PCIE总线架构的高性能数据传输卡，板卡具有1个PCIex8主机接口、2个QSFP+40G光纤接口，可以实现2路QSFP+40G光纤的数据实时采集、传输。板卡采用Xilinx的高性能KintexUltraScale系列FPGA作为实时处理器，板载2组独立的72位DDR4SDRAM大容量缓存。板卡具有1个RJ45千兆以太网口以及若干IO信号。一般应用于基于服务器的雷
【大模型】Transformer架构完全解读：从“盲人摸象“到“通晓万物“的AI进化论全栈追梦人大模型 #提示工程 transformer 架构深度学习
Transformer架构完全解读：从"盲人摸象"到"通晓万物"的AI进化论——一位大模型探索者的技术日记☕第一章：为什么说Transformer是AI界的"蒸汽机革命"？1.1从RNN到Transformer：一场效率革命场景：咖啡厅里两位开发者的对话实习生小雨：“学长，为什么现在都用Transformer？RNN不是也能处理文本吗？”资深工程师老张：（掏出纸巾画图）“想象RNN是个严格的图书管
非结构化数据真“野”？聊聊AI处理它时踩过的那些坑 Echo_Wish Python 进阶人工智能
非结构化数据真“野”？聊聊AI处理它时踩过的那些坑在AI圈子里有一句“老话”：真正的世界，是非结构化的。图像、音频、视频、文本、传感器原始数据……这些在数据库里没个字段、没个主键的家伙，占据了全世界80%以上的数据量。咱们都喜欢说“数据是新时代的石油”，但很少人说：非结构化数据，就是粘稠未提炼的原油——处理它，才是最累的活。这篇文章，我不想跟你讲那些“炫技”的论文和模型，而是从一个一线AI工程师的
制造业多工厂协同如何破局？深度解析网络方案优劣，助力企业高效转型北极光SD-WAN组网网络
随着制造业数字化和智能化转型的加速，越来越多的企业在全国乃至全球范围内布局生产基地。然而，多工厂异地协同中，网络性能的瓶颈往往成为阻碍企业高效运营的一大难题。本文将围绕制造业多工厂异地协同这一场景，详细分析其痛点，并对比几种主流网络解决方案的优劣，帮助企业找到最优的网络架构。一、多工厂异地协同的核心痛点在制造业的日常生产中，异地分布的生产基地（如总部、分厂、车间）需要高效协同以确保生产计划的执行和
Python实例题：基于 KNN 算法的手写数字识别
目录Python实例题题目要求：解题思路：代码实现：Python实例题题目基于KNN算法的手写数字识别要求：实现一个基于K-NearestNeighbors(KNN)算法的手写数字识别系统。支持以下功能：使用MNIST数据集训练和测试模型实现KNN分类算法可视化手写数字样本评估模型性能（准确率、混淆矩阵等）添加用户交互界面，允许用户绘制数字并进行识别。解题思路：使用sklearn加载MNIST数据
《FastAPI & AI编程结合：从入门到精通》指南 AI编程员 001AI传统＆编程语言 002AI编程工具汇总 003AI编程作品汇总笔记学习 fastapi 开发语言深度学习
以下是一篇系统性的《FastAPI&AI编程结合：从入门到精通》指南，共分30大章节，超过10万字，涵盖FastAPI核心开发、AI集成原理、高性能优化、经典案例和5大完整项目实战。第一章：FastAPI革命性优势1.1现代API框架对比#性能基准测试(Requests/sec)|框架|JSON响应|数据验证|异步支持||---
【算法笔记】红黑树插入操作 PXM的算法星球算法笔记算法笔记
红黑树插入与调整详解一、红黑树的五大性质红黑树是一种自平衡的二叉搜索树（BST），其核心特性如下：颜色属性：每个节点非红即黑根属性：根节点必须为黑色叶子属性：所有的NIL叶子节点都是黑色红节点约束：红色节点的子节点必须为黑色（即无连续红节点）黑高平衡：从任一节点到其所有后代叶子节点的路径中，黑色节点数量相等二、插入操作流程阶段1：标准BST插入从根节点开始查找插入位置新节点总是红色按照BST规则插
Python助力自动驾驶：深度学习模型优化全攻略 Echo_Wish Python！实战！python 自动驾驶深度学习
Python助力自动驾驶：深度学习模型优化全攻略说起自动驾驶，大家第一反应往往是“高精地图”“传感器融合”“路径规划”等等，背后真正的“大脑”其实是各式各样的深度学习模型。它们负责感知环境、识别路况、预测行为，甚至实时做出决策。可是，跑在车上的这些模型不仅要精准，还得轻量、实时、稳定，这可不是简单的“丢GPU就能解决”的问题。今天，咱们就从Python开发者的视角，聊聊自动驾驶里深度学习模型的优化
解读Servlet原理篇二---GenericServlet与HttpServlet 周凡杨 java HttpServlet 源理 GenericService 源码
在上一篇《解读Servlet原理篇一》中提到，要实现javax.servlet.Servlet接口（即写自己的Servlet应用），你可以写一个继承自javax.servlet.GenericServletr的generic Servlet ，也可以写一个继承自java.servlet.http.HttpServlet的HTTP Servlet（这就是为什么我们自定义的Servlet通常是exte
MySQL性能优化 bijian1013 数据库 mysql
性能优化是通过某些有效的方法来提高MySQL的运行速度，减少占用的磁盘空间。性能优化包含很多方面，例如优化查询速度，优化更新速度和优化MySQL服务器等。本文介绍方法的主要有： a.优化查询 b.优化数据库结构
ThreadPool定时重试 dai_lm java ThreadPool thread timer timertask
项目需要当某事件触发时，执行http请求任务，失败时需要有重试机制，并根据失败次数的增加，重试间隔也相应增加，任务可能并发。由于是耗时任务，首先考虑的就是用线程来实现，并且为了节约资源，因而选择线程池。为了解决不定间隔的重试，选择Timer和TimerTask来完成 package threadpool; public class ThreadPoolTest {
Oracle 查看数据库的连接情况周凡杨 sql oracle 连接
首先要说的是，不同版本数据库提供的系统表会有不同，你可以根据数据字典查看该版本数据库所提供的表。 select * from dict where table_name like '%SESSION%'; 就可以查出一些表，然后根据这些表就可以获得会话信息 select sid,serial#,status,username,schemaname,osuser,terminal,ma
类的继承朱辉辉33 java
类的继承可以提高代码的重用行，减少冗余代码；还能提高代码的扩展性。Java继承的关键字是extends 格式:public class 类名（子类）extends 类名（父类）{ } 子类可以继承到父类所有的属性和普通方法，但不能继承构造方法。且子类可以直接使用父类的public和 protected属性，但要使用private属性仍需通过调用。子类的方法可以重写，但必须和父类的返回值类
android 悬浮窗特效肆无忌惮_ android
最近在开发项目的时候需要做一个悬浮层的动画，类似于支付宝掉钱动画。但是区别在于，需求是浮出一个窗口，之后边缩放边位移至屏幕右下角标签处。效果图如下：一开始考虑用自定义View来做。后来发现开线程让其移动很卡，ListView+动画也没法精确定位到目标点。后来想利用Dialog的dismiss动画来完成。自定义一个Dialog后，在styl
hadoop伪分布式搭建林鹤霄 hadoop
要修改4个文件 1: vim hadoop-env.sh 第九行 2: vim core-site.xml <configuration> &n
gdb调试命令 aigo gdb
原文：http://blog.csdn.net/hanchaoman/article/details/5517362 一、GDB常用命令简介 r run 运行.程序还没有运行前使用 c cuntinue
Socket编程的HelloWorld实例 alleni123 socket
public class Client { public static void main(String[] args) { Client c=new Client(); c.receiveMessage(); } public void receiveMessage(){ Socket s=null; BufferedRea
线程同步和异步百合不是茶线程同步异步
多线程和同步 : 如进程、线程同步，可理解为进程或线程A和B一块配合，A执行到一定程度时要依靠B的某个结果，于是停下来，示意B运行；B依言执行，再将结果给A；A再继续操作。所谓同步，就是在发出一个功能调用时，在没有得到结果之前，该调用就不返回，同时其它线程也不能调用这个方法多线程和异步:多线程可以做不同的事情,涉及到线程通知 &
JSP中文乱码分析 bijian1013 java jsp 中文乱码
在JSP的开发过程中，经常出现中文乱码的问题。首先了解一下Java中文问题的由来： Java的内核和class文件是基于unicode的，这使Java程序具有良好的跨平台性，但也带来了一些中文乱码问题的麻烦。原因主要有两方面，
js实现页面跳转重定向的几种方式 bijian1013 JavaScript 重定向
js实现页面跳转重定向有如下几种方式：一.window.location.href <script language="javascript"type="text/javascript"> window.location.href="http://www.baidu.c
【Struts2三】Struts2 Action转发类型 bit1129 struts2
在【Struts2一】 Struts Hello World http://bit1129.iteye.com/blog/2109365中配置了一个简单的Action，配置如下 <!DOCTYPE struts PUBLIC "-//Apache Software Foundation//DTD Struts Configurat
【HBase十一】Java API操作HBase bit1129 hbase
Admin类的主要方法注释： 1. 创建表 /** * Creates a new table. Synchronous operation. * * @param desc table descriptor for table * @throws IllegalArgumentException if the table name is res
nginx gzip ronin47 nginx gzip
Nginx GZip 压缩 Nginx GZip 模块文档详见：http://wiki.nginx.org/HttpGzipModule 常用配置片段如下： gzip on; gzip_comp_level 2; # 压缩比例，比例越大，压缩时间越长。默认是1 gzip_types text/css text/javascript; # 哪些文件可以被压缩 gzip_disable &q
java-7.微软亚院之编程判断俩个链表是否相交给出俩个单向链表的头指针，比如 h1 ， h2 ，判断这俩个链表是否相交 bylijinnan java
public class LinkListTest { /** * we deal with two main missions: * * A. * 1.we create two joined-List(both have no loop) * 2.whether list1 and list2 join * 3.print the join
Spring源码学习-JdbcTemplate batchUpdate批量操作 bylijinnan java spring
Spring JdbcTemplate的batch操作最后还是利用了JDBC提供的方法，Spring只是做了一下改造和封装 JDBC的batch操作： String sql = "INSERT INTO CUSTOMER " + "(CUST_ID, NAME, AGE) VALUES (?, ?, ?)";
[JWFD开源工作流]大规模拓扑矩阵存储结构最新进展 comsci 工作流
生成和创建类已经完成,构造一个100万个元素的矩阵模型,存储空间只有11M大,请大家参考我在博客园上面的文档"构造下一代工作流存储结构的尝试",更加相信的设计和代码将陆续推出......... 竞争对手的能力也很强.......,我相信..你们一定能够先于我们推出大规模拓扑扫描和分析系统的....
base64编码和url编码 cuityang base64 url
import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; import java.io.PrintWriter; import java.io.StringWriter; import java.io.UnsupportedEncodingException;
web应用集群Session保持 dalan_123 session
关于使用 memcached 或redis 存储 session ，以及使用 terracotta 服务器共享。建议使用 redis，不仅仅因为它可以将缓存的内容持久化，还因为它支持的单个对象比较大，而且数据类型丰富，不只是缓存 session，还可以做其他用途，一举几得啊。1、使用 filter 方法存储这种方法比较推荐，因为它的服务器使用范围比较多，不仅限于tomcat ，而且实现的原理比较简
Yii 框架里数据库操作详解-[增加、查询、更新、删除的方法 'AR模式'] dcj3sjt126com 数据库
public function getMinLimit () { $sql = "..."; $result = yii::app()->db->createCo
solr StatsComponent（聚合统计） eksliang solr聚合查询 solr stats
StatsComponent 转载请出自出处：http://eksliang.iteye.com/blog/2169134 http://eksliang.iteye.com/ 一、概述 Solr可以利用StatsComponent 实现数据库的聚合统计查询，也就是min、max、avg、count、sum的功能二、参数
百度一道面试题 greemranqq 位运算百度面试寻找奇数算法 bitmap 算法
那天看朋友提了一个百度面试的题目：怎么找出{1,1,2,3,3,4,4,4,5,5,5,5} 找出出现次数为奇数的数字. 我这里复制的是原话，当然顺序是不一定的，很多拿到题目第一反应就是用map,当然可以解决，但是效率不高。还有人觉得应该用算法xxx,我是没想到用啥算法好...！还有觉得应该先排序... 还有觉
Spring之在开发中使用SpringJDBC ihuning spring
在实际开发中使用SpringJDBC有两种方式： 1. 在Dao中添加属性JdbcTemplate并用Spring注入； JdbcTemplate类被设计成为线程安全的，所以可以在IOC 容器中声明它的单个实例，并将这个实例注入到所有的 DAO 实例中。JdbcTemplate也利用了Java 1.5 的特定(自动装箱，泛型，可变长度
JSON API 1.0 核心开发者自述 | 你所不知道的那些技术细节 justjavac json
2013年5月，Yehuda Katz 完成了JSON API(英文，中文) 技术规范的初稿。事情就发生在 RailsConf 之后，在那次会议上他和 Steve Klabnik 就 JSON 雏形的技术细节相聊甚欢。在沟通单一 Rails 服务器库—— ActiveModel::Serializers 和单一 JavaScript 客户端库——&
网站项目建设流程概述 macroli 工作
一.概念网站项目管理就是根据特定的规范、在预算范围内、按时完成的网站开发任务。二.需求分析项目立项　　我们接到客户的业务咨询，经过双方不断的接洽和了解，并通过基本的可行性讨论够，初步达成制作协议，这时就需要将项目立项。较好的做法是成立一个专门的项目小组，小组成员包括：项目经理，网页设计，程序员，测试员，编辑/文档等必须人员。项目实行项目经理制。客户的需求说明书　　第一步是需
AngularJs 三目运算表达式判断 qiaolevip 每天进步一点点学习永无止境众观千象 AngularJS
事件回顾：由于需要修改同一个模板，里面包含2个不同的内容，第一个里面使用的时间差和第二个里面名称不一样，其他过滤器，内容都大同小异。希望杜绝If这样比较傻的来判断if-show or not，继续追究其源码。 var b = "{{", a = "}}"; this.startSymbol = function(a) {
Spark算子：统计RDD分区中的元素及数量 superlxw1234 spark spark算子 Spark RDD分区元素
关键字：Spark算子、Spark RDD分区、Spark RDD分区元素数量 Spark RDD是被分区的，在生成RDD时候，一般可以指定分区的数量，如果不指定分区数量，当RDD从集合创建时候，则默认为该程序所分配到的资源的CPU核数，如果是从HDFS文件创建，默认为文件的Block数。可以利用RDD的mapPartitionsWithInd
Spring 3.2.x将于2016年12月31日停止支持 wiselyman Spring 3
Spring 团队公布在2016年12月31日停止对Spring Framework 3.2.x（包含tomcat 6.x）的支持。在此之前spring团队将持续发布3.2.x的维护版本。请大家及时准备及时升级到Spring
fis纯前端解决方案fis-pure zccst JavaScript
作者：zccst FIS通过插件扩展可以完美的支持模块化的前端开发方案，我们通过FIS的二次封装能力，封装了一个功能完备的纯前端模块化方案pure。 1，fis-pure的安装 $ fis install -g fis-pure $ pure -v 0.1.4 2，下载demo到本地 git clone https://github.com/hefangshi/f