pytorch bert预训练(调用transformer)

文章目录

  • 1-遮蔽语言模型、句子预测任务、问答任务
      • 1.遮蔽语言模型-中文
  • [2-三种类 BertTokenizer、BertModel、BertForMaskedLM](https://blog.csdn.net/ccbrid/article/details/88732857)
  • 3. bert output的源码解释




Bert 详解-非简体(排版好看)
bert相关资源整理-原理篇
一文学会pytorch版本bert使用-英文例




1-遮蔽语言模型、句子预测任务、问答任务

1.遮蔽语言模型-中文

model_name = 'bert-base-chinese'       #指定需下载的预训练模型参数

#任务一:遮蔽语言模型
# BERT 在预训练中引入了 [CLS] 和 [SEP] 标记句子的开头和结尾
samples = ['[CLS] 中国的首都是哪里? [SEP] 北京是 [MASK] 国的首都。 [SEP]']  # 准备输入模型的语句

tokenizer = BertTokenizer.from_pretrained(model_name)
tokenized_text = [tokenizer.tokenize(i) for i in samples]                 #将句子分割成一个个token,即一个个汉字和分隔符
input_ids = [tokenizer.convert_tokens_to_ids(i) for i in tokenized_text]  #把每个token转换成对应的索引
input_ids = torch.LongTensor(input_ids)

# 读取预训练模型
model = BertForMaskedLM.from_pretrained(model_name, cache_dir="E:/transformer_file/")
model.eval()

outputs = model(input_ids)
#---------------------------------------------------------------------------------
prediction_scores = outputs[0]                     #prediction_scores.shape=torch.Size([1, 21, 21128])
												   #output[0]是last layer的hidden state
sample = prediction_scores[0].detach().numpy()     #sample.shape = (21, 21128)

pred = np.argmax(sample, axis=1)                   #21为序列长度,pred代表每个位置最大概率的字符索引
print(tokenizer.convert_ids_to_tokens(pred)[14])   #被标记的[MASK]是第14个位置
#------------------------------------------------------------------------------
pred_score=outputs[0][0][masked_idx]outputs[0][0][masked_idx]   #get the bert pred score of all word in vocabulary at the masked position
pred_score=pred_score.to("cpu").numpy()  #从tensor转化成numpy
#------------------------------------------------------------------------------------------------

2-三种类 BertTokenizer、BertModel、BertForMaskedLM

使用Pytorch版本BERT使用方式如下:
(1)First prepare a tokenized input with BertTokenizer

import torch
from pytorch_pretrained_bert import BertTokenizer, BertModel, BertForMaskedLM
 
# 加载词典 pre-trained model tokenizer (vocabulary)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
 
# Tokenized input
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
tokenized_text = tokenizer.tokenize(text)
 
# Mask a token that we will try to predict back with `BertForMaskedLM`
masked_index = 8
tokenized_text[masked_index] = '[MASK]'
assert tokenized_text == ['[CLS]', 'who', 'was', 'jim', 'henson', '?', '[SEP]', 'jim', '[MASK]', 'was', 'a', 'puppet', '##eer', '[SEP]']
 
# 将 token 转为 vocabulary 索引
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
# 定义句子 A、B 索引
segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
 
# 将 inputs 转为 PyTorch tensors
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])

(2)use BertModel to get hidden states

# 加载模型 pre-trained model (weights)
model = BertModel.from_pretrained('bert-base-uncased')
model.eval()
 
# GPU & put everything on cuda
tokens_tensor = tokens_tensor.to('cuda')
segments_tensors = segments_tensors.to('cuda')
model.to('cuda')
 
# 得到每一层的 hidden states 
with torch.no_grad():
    encoded_layers, _ = model(tokens_tensor, segments_tensors)
# 模型 bert-base-uncased 有12层,所以 hidden states 也有12层
assert len(encoded_layers) == 12

(3)use BertForMaskedLM

# 加载模型 pre-trained model (weights)
model = BertForMaskedLM.from_pretrained('bert-base-uncased')
model.eval()
 
# cuda
tokens_tensor = tokens_tensor.to('cuda')
segments_tensors = segments_tensors.to('cuda')
model.to('cuda')
 
# Predict all tokens
with torch.no_grad():
    predictions = model(tokens_tensor, segments_tensors)
 
# confirm we were able to predict 'henson'
predicted_index = torch.argmax(predictions[0, masked_index]).item()
predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]
assert predicted_token == 'henson'

3. bert output的源码解释

  Return:
      :obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.BertConfig`) and inputs:
      last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`):
          Sequence of hidden-states at the output of the last layer of the model.
      pooler_output (:obj:`torch.FloatTensor`: of shape :obj:`(batch_size, hidden_size)`):
          Last layer hidden-state of the first token of the sequence (classification token)
          further processed by a Linear layer and a Tanh activation function. The Linear
          layer weights are trained from the next sentence prediction (classification)
          objective during pre-training.

          This output is usually *not* a good summary
          of the semantic content of the input, you're often better with averaging or pooling
          the sequence of hidden-states for the whole input sequence.
      hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
          Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
          of shape :obj:`(batch_size, sequence_length, hidden_size)`.

          Hidden-states of the model at the output of each layer plus the initial embedding outputs.
      attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
          Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
          :obj:`(batch_size, num_heads, sequence_length, sequence_length)`.

          Attentions weights after the attention softmax, used to compute the weighted average in the self-attention
          heads.

bert使用过程中的参数

bert源码

bertformaskedLM输出的是词表中的所有词,在被遮住的位置出现的概率

你可能感兴趣的:(ML/DL/NLP)