大模型的魔法

在神经网络中weight decay

weight decay(权值衰减)的最终目的是防止过拟合。在损失函数中,weight decay是放在正则项(regularization)前面的一个系数,正则项一般指示模型的复杂度,所以weight decay的作用是调节模型复杂度对损失函数的影响,若weight decay很大,则复杂的模型损失函数的值也就大。

momentum是梯度下降法中一种常用的加速技术。对于一般的SGD,其表达式为x \leftarrow  x-\alpha \ast dx
,x沿负梯度方向下降。而带momentum项的SGD则写生如下形式:

在这里插入图片描述

其中\beta 即momentum系数,通俗的理解上面式子就是,如果上一次的momentum(即v)与这一次的负梯度方向是相同的,那这次下降的幅度就会加大,所以这样做能够达到加速收敛的过程。

Template 构建

{'placeholder:} ——感觉是用来放置sent的
{‘meta’: } ——感觉是用来放置一些特定实体的,比如,entity、title等等
{‘soft’: ,‘duplicate’:} ——软标签,表示需要优化的参数,如果是word,则初始化为token的emb的吧(我理解是这样)如果是其他,则随机初始化。参数duplicate 表示soft tokens个数,比如,50个软令牌等。
{“mask”} ——表示产出
官方文档https://thunlp.github.io/OpenPrompt/notes/template.html?highlight=duplicate

template 参数

参数查看:[n for n,p in prompt_model.template.named_parameters()]
n _name ,p_paramters
LM模型参数查看:[n for n, p in prompt_model.plm.named_parameters()]

template包装结果查看

#如果您尝试定义 10000 个软令牌,请使用密钥 ,duplicate
template_text ='{"placeholder":"text_a"} {"soft": "quenstion", "duplicate": 50} {"placeholder":"text_b"} {"soft": "yes", "duplicate": 16} {"soft": "no", "duplicate":16} {"soft": "maybe" , "duplicate": 16} {"mask"}.'
mytemplate = MixedTemplate(model=plm,tokenizer=tokenizer, text=template_text)

# To better understand how does the template wrap the example, we visualize one instance.
wrapped_example = mytemplate.wrap_one_example(dataset['train'][0])
wrapped_example

----------------------其中,dataset[‘train’][0]格式为:

{
  "guid": 0,
  "label": 0,
  "meta": {},
  "text_a": "It was a complex language. Not written down but handed down. One might say it was peeled down.",
  "text_b": "the language was peeled down",
  "tgt_text": null
}

整个训练过程

1加载数据和LM

数据加载为字典形式

model_inputs = {}
for split in ['train', 'validation', 'test']:
    model_inputs[split] = []
    for sample in dataset[split]:
        tokenized_example = wrapped_t5tokenizer.tokenize_one_example(mytemplate.wrap_one_example(sample), teacher_forcing=False)
        model_inputs[split].append(tokenized_example)


from openprompt import PromptDataLoader

train_dataloader = PromptDataLoader(dataset=dataset["train"], template=mytemplate, tokenizer=tokenizer,
    tokenizer_wrapper_class=WrapperClass, max_seq_length=256, decoder_max_length=3,
    batch_size=4,shuffle=True, teacher_forcing=False, predict_eos_token=False,
    truncate_method="head")
#tokenizing: 250it [00:00, 624.06it/s]表示训练数据的数量

2 定义template

3训练

定义要更新的参数,比如,LM参数中那部分,template——model中的哪部分参数

from openprompt import PromptForClassification

use_cuda = torch.cuda.is_available()
print("GPU enabled? {}".format(use_cuda))
prompt_model = PromptForClassification(plm=plm,template=mytemplate, verbalizer=myverbalizer, freeze_plm=False)
if use_cuda:
    prompt_model=  prompt_model.cuda()


from transformers import  AdamW, get_linear_schedule_with_warmup
loss_func = torch.nn.CrossEntropyLoss()
no_decay = ['bias', 'LayerNorm.weight']
# it's always good practice to set no decay to biase and LayerNorm parameters
optimizer_grouped_parameters = [
    {'params': [p for n, p in prompt_model.named_parameters() if not any(nd in n for nd in no_decay)], 'weight_decay': 0.01},#weight_decay : 权重衰减项,防止过拟合的一个参数
    {'params': [p for n, p in prompt_model.named_parameters() if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}
]

optimizer = AdamW(optimizer_grouped_parameters, lr=1e-4)

for epoch in range(5):
    tot_loss = 0
    for step, inputs in enumerate(train_dataloader):
        if use_cuda:
            inputs = inputs.cuda()
        logits = prompt_model(inputs)
        labels = inputs['label']
        loss = loss_func(logits, labels)
        loss.backward()
        tot_loss += loss.item()
        optimizer.step()
        optimizer.zero_grad()
        if step %100 ==1:
            print("Epoch {}, average loss: {}".format(epoch, tot_loss/(step+1)), flush=True)

4测试

Verbalizer构建

手工构建,ManualVerbalizer,labels是由词构成,比如[[great,wonderful],[bad]] or {“World”: “politics”, “Tech”: “technology”}
SoftVerbalizer

Verbalizer参数

OpenPrompt
Prompt——demo链接:https://colab.research.google.com/drive/10syott1zXaQkjnlxOiSXKDFGy68SWR0y?usp=sharing#scrollTo=MHZc0szQ8tkY

opendelta
Delra——demo链接:
https://colab.research.google.com/drive/1uAhgAdc8Qr42UKYDlgUv0f7W1-gAFwGo?usp=sharing

你可能感兴趣的:(暑期课程Tinghua,人工智能)