BERT特性:
一个用于token级分类的模型,可用于命名实体识别(NER)、部分语音标记(POS)等。对于给定的输入序列,模型将为每个标记/词产生一个标签。
输出的维度是 [batch_size, sequence_length, num_labels],其中 num_labels 是可能的标签数量。
class transformers.BertForTokenClassification(config)
继承父类:BertPreTrainedModel、torch.nn.Module
参数:config (BertConfig)——包含模型所有参数的模型配置类。
包含一个token分类的任务头(线性层,可用于NER)。
forward方法:
参数
torch.LongTensor
of shape (batch_size, sequence_length)
) — 输入序列对应的分词索引列表(indices list)。索引根据AutoTokenizer得到。torch.FloatTensor
of shape (batch_size, sequence_length)
, optional) — 对输入序列的部分token加上掩码,使得注意力机制不会计算到。如填充token的索引(padding token indices)。取值为 [0, 1]
二者之一。取0则表明掩码,取1则表明不掩码。torch.LongTensor
of shape (batch_size, sequence_length)
, optional) — 在分句任务中,表明token是属于第一句还是第二句。取值为 [0, 1]
二者之一。torch.LongTensor
of shape (batch_size, sequence_length)
, optional) — 输入序列对应的位置索引列表(positional indices list)。 取值范围为 [0, config.max_position_embeddings - 1]
,从而加入位置信息。torch.FloatTensor
of shape (num_heads,)
or (num_layers, num_heads)
, optional) — 掩码(多头)自注意力模块的头。取值为 [0, 1]
二者之一:取0则表示对应的头要掩码,取1则表示对应的头不掩码。torch.FloatTensor
of shape (batch_size, sequence_length, hidden_size)
, optional) — 如果想要直接将嵌入向量传入给模型,由自己控制 input_ids
的关联向量,那么就传这个参数。这样就不需要由本模型内部的嵌入层矩阵运算 input_ids
。bool
, optional) — 是否希望模型返回所有的注意力分数。bool
, optional) — 是否希望模型返回所有层的隐藏状态。bool
, optional) — 是否希望输出的是ModelOutput,而不是直接的元组tuple。torch.LongTensor
of shape (batch_size, sequence_length)
, optional) — 提供标签,用于计算loss。取值范围为 [0, config.max_position_embeddings - 1]
。返回值
transformers.modeling_outputs.TokenClassifierOutput 或 tuple(torch.FloatTensor)
return_dict
为 False
(或 return_dict
为空但配置文件中 self.config.use_return_dict
为 False
):
labels
参数,输出是一个元组,包含:
loss
: 计算的损失值。logits
: 分类头的输出,形状为 (batch_size, sequence_length, num_labels)
。labels
参数,输出只包含 logits
和其他 BERT 的输出。return_dict
为 True
(或 return_dict
为空但配置文件中 self.config.use_return_dict
为 False
):
TokenClassifierOutput
对象,包含以下属性:
loss
: 如果提供了 labels
参数,这是计算的损失值。logits
: 分类头的输出,形状为 (batch_size, sequence_length, num_labels)
。hidden_states
: BERT 的隐藏状态输出。attentions
: BERT 的注意力权重输出。代码实现
@add_start_docstrings(
"""
Bert Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for
Named-Entity-Recognition (NER) tasks.
""",
BERT_START_DOCSTRING,
)
class BertForTokenClassification(BertPreTrainedModel):
def __init__(self, config):
super().__init__(config)
self.num_labels = config.num_labels # 标签的数量
self.bert = BertModel(config, add_pooling_layer=False) # 预训练BERT
classifier_dropout = (
config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
)
self.dropout = nn.Dropout(classifier_dropout)
self.classifier = nn.Linear(config.hidden_size, config.num_labels) # classification任务头,加在预训练BERT之上
# Initialize weights and apply final processing
self.post_init()
@add_start_docstrings_to_model_forward(BERT_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
@add_code_sample_docstrings(
checkpoint=_CHECKPOINT_FOR_TOKEN_CLASSIFICATION,
output_type=TokenClassifierOutput,
config_class=_CONFIG_FOR_DOC,
expected_output=_TOKEN_CLASS_EXPECTED_OUTPUT,
expected_loss=_TOKEN_CLASS_EXPECTED_LOSS,
)
def forward(
self,
input_ids: Optional[torch.Tensor] = None,
attention_mask: Optional[torch.Tensor] = None,
token_type_ids: Optional[torch.Tensor] = None,
position_ids: Optional[torch.Tensor] = None,
head_mask: Optional[torch.Tensor] = None,
inputs_embeds: Optional[torch.Tensor] = None,
labels: Optional[torch.Tensor] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
return_dict: Optional[bool] = None,
) -> Union[Tuple[torch.Tensor], TokenClassifierOutput]:
r"""
labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
Labels for computing the token classification loss. Indices should be in `[0, ..., config.num_labels - 1]`.
"""
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
outputs = self.bert(
input_ids,
attention_mask=attention_mask,
token_type_ids=token_type_ids,
position_ids=position_ids,
head_mask=head_mask,
inputs_embeds=inputs_embeds,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
return_dict=return_dict,
) # 预训练BERT的计算,得到输入序列经BERT计算的向量序列
sequence_output = outputs[0]
sequence_output = self.dropout(sequence_output)
logits = self.classifier(sequence_output) # 再经过最后的任务头classification
loss = None
if labels is not None:
loss_fct = CrossEntropyLoss()
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
if not return_dict:
output = (logits,) + outputs[2:]
return ((loss,) + output) if loss is not None else output
return TokenClassifierOutput(
loss=loss,
logits=logits,
hidden_states=outputs.hidden_states,
attentions=outputs.attentions,
)
使用示例:
from transformers import AutoTokenizer, BertForTokenClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english")
model = BertForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english")
inputs = tokenizer(
"HuggingFace is a company based in Paris and New York", add_special_tokens=False, return_tensors="pt"
)
with torch.no_grad():
logits = model(**inputs).logits # 想要得到分类后的权重,获取的是输出的logits对象。
predicted_token_class_ids = logits.argmax(-1)
# Note that tokens are classified rather then input words which means that
# there might be more predicted token classes than words.
# Multiple token classes might account for the same word
predicted_tokens_classes = [model.config.id2label[t.item()] for t in predicted_token_class_ids[0]]
# predicted_tokens_classes = ['O', 'I-ORG', 'I-ORG', 'I-ORG', 'O', 'O', 'O', 'O', 'O', 'I-LOC', 'O', 'I-LOC', 'I-LOC']
一个用于整个句子或段落级别的分类的模型,可用于情感分析、文本分类等。对于给定的输入,模型将为整个序列产生一个分类标签。
输出的维度是 [batch_size, num_labels],其中 num_labels 是可能的分类数量。
class transformers.BertForSequenceClassification(config)
继承父类:BertPreTrainedModel、torch.nn.Module
参数:config (BertConfig)——包含模型所有参数的模型配置类。
forward方法:与BertForTokenClassification相同。
与BertForTokenClassification的差异:
使用示例:
import torch
from transformers import AutoTokenizer, BertForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("textattack/bert-base-uncased-yelp-polarity")
model = BertForSequenceClassification.from_pretrained("textattack/bert-base-uncased-yelp-polarity")
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
predicted_class_id = logits.argmax().item()
predicted_class_label = model.config.id2label[predicted_class_id]
# predicted_class_label = LABEL_1