[论文阅读笔记27]biaffine4NER:双仿射分类器在NER的应用

题目

Named Entity Recognition as Dependency Parsing

Yu, J., Bohnet, B., & Poesio, M. (2020). Named Entity Recognition as Dependency Parsing. ArXiv, abs/2005.07150.
代码:https://github.com/juntaoy/biaffine-ner

作者

Juntao Yu
Queen Mary University London, UK 伦敦玛丽女王大学

Bernd Bohnet
Google Research Netherlands

Massimo Poesio
Queen Mary University London, UK

摘要

也是为了解决nested NER而提出的模型;
基本思想:以基于图的依赖解析构建一个全局图作为输入,经过biaffine模型进行处理;具体地,biaffine模型主要是对句子中的tokens的start与end对(即所有的spans)进行评分,最后达到实体抽取的目的。

模型方法

主要思想:把实体抽取任务看成为识别start与end索引的问题,同时对这个start与end形成的span赋予类型。采用biaffine模型在多层神经网络的顶端输出所有spans的分数,然后根据这些分数对候选spans进行排序,返回top-rank spans,这些数据满足flat or nested Ner的限制条件的。网络结构如下图:

[论文阅读笔记27]biaffine4NER:双仿射分类器在NER的应用_第1张图片

input

​ 级联word embeddings和character embeddings;

​ 词编码:BERT_Large,fastText

​ 输入入的数据集处理成的样子:

{"doc_key": "batch_01", 
"ners": [[[0, 0, "PER"], [3, 3, "GPE"], [5, 5, "GPE"]], 
[[3, 3, "PER"], [10, 14, "ORG"], [20, 20, "GPE"], [20, 25, "GPE"], [22, 22, "GPE"]], 
[]], 
"sentences": [["Anwar", "arrived", "in", "Shanghai", "from", "Nanjing", "yesterday", "afternoon", "."], 
["This", "morning", ",", "Anwar", "attended", "the", "foundation", "laying", "ceremony", "of", "the", "Minhang", "China-Malaysia", "joint-venture", "enterprise", ",", "and", "after", "that", "toured", "Pudong", "'s", "Jingqiao", "export", "processing", "district", "."], 
["(", "End", ")"]]}

数据主要分为三种信息:doc_key, ners, sentences; 文档id, 实体,句子。实体采用start,end,类型三元组来表达,句子采用字符数组来表达。

然后看下模型实际要的数据格式:

[论文阅读笔记27]biaffine4NER:双仿射分类器在NER的应用_第2张图片

然后看一下这个文本是怎么组装的:

def tensorize_example(self, example, is_training):
  ners = example["ners"]
  sentences = example["sentences"]

  max_sentence_length = max(len(s) for s in sentences)
  max_word_length = max(max(max(len(w) for w in s) for s in sentences), max(self.config["filter_widths"]))
  text_len = np.array([len(s) for s in sentences])
  tokens = [[""] * max_sentence_length for _ in sentences]
  char_index = np.zeros([len(sentences), max_sentence_length, max_word_length])
  context_word_emb = np.zeros([len(sentences), max_sentence_length, self.context_embeddings_size])
  lemmas = []
  if "lemmas" in example:
    lemmas = example["lemmas"]
  for i, sentence in enumerate(sentences):
    for j, word in enumerate(sentence):
      tokens[i][j] = word
      if self.context_embeddings.is_in_embeddings(word):
        context_word_emb[i, j] = self.context_embeddings[word]
      elif lemmas and self.context_embeddings.is_in_embeddings(lemmas[i][j]):
        context_word_emb[i,j] = self.context_embeddings[lemmas[i][j]]
      char_index[i, j, :len(word)] = [self.char_dict[c] for c in word]

  tokens = np.array(tokens)

  doc_key = example["doc_key"]

  lm_emb = self.load_lm_embeddings(doc_key)

  gold_labels = []
  if is_training:
    for sid, sent in enumerate(sentences):
      ner = {(s,e):self.ner_maps[t] for s,e,t in ners[sid]}
      for s in xrange(len(sent)):
        for e in xrange(s,len(sent)):
          gold_labels.append(ner.get((s,e),0))
  gold_labels = np.array(gold_labels)

  example_tensors = (tokens, context_word_emb,lm_emb, char_index, text_len, is_training, gold_labels)

  return example_tensors

这里可以看到一个句子长*句子长的字符下标矩阵char_index,这个只要是对char进行embedding的,它的处理模型采用CNN。

再看一下输入的级联所包含的信息:

context_emb_list = []
context_emb_list.append(context_word_emb)
context_emb_list.append(aggregated_char_emb)
context_emb_list.append(aggregated_lm_emb)
context_emb = tf.concat(context_emb_list, 2)
context_emb = tf.nn.dropout(context_emb, self.lexical_dropout)

隐含层:BiLSTM

FFNNs

这里对BiLSTM分别用了两个前馈全连接网络。

[论文阅读笔记27]biaffine4NER:双仿射分类器在NER的应用_第3张图片

其中projection函数为:

def projection(inputs, output_size, initializer=None):
  return ffnn(inputs, 0, -1, output_size, dropout=None, output_weights_initializer=initializer)

从代码就可以看到BiLSTM分别输入两个FFNN。

分类器:biaffine classififier.

关于biaffine分类器:

[论文阅读笔记27]biaffine4NER:双仿射分类器在NER的应用_第4张图片

r矩阵的大小为:l*l*c.l表示句子长度;c是NER类别+1;r矩阵提供了所有spans的可能。

以下公式对于任意的spans赋予分类标签:
image-20210303153044052

然后对于瓢标签的进行根据分数进行排序。

损失函数为:
[论文阅读笔记27]biaffine4NER:双仿射分类器在NER的应用_第5张图片

从代码中看:

candidate_ner_scores = util.bilinear_classifier(candidate_starts_emb,candidate_ends_emb,self.dropout,output_size=self.num_types+1)#[num_sentence, max_sentence_length,max_sentence_length,types+1]
candidate_ner_scores = tf.boolean_mask(tf.reshape(candidate_ner_scores,[-1,self.num_types+1]),flattened_candidate_scores_mask)
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=gold_labels, logits=candidate_ner_scores)
loss = tf.reduce_sum(loss)

把start与end一起输入到双曲线分类器了,最后采用交叉熵来计算损失函数。

实验

数据集:

​ 对于nested ner: ACE 2004, ACE 2005, GENIA;

​ 对于flat ner: CONLL 2002,CONLL 2003 ,ONTONOTES;

实验1: nested ner

[论文阅读笔记27]biaffine4NER:双仿射分类器在NER的应用_第6张图片

实验2: Flat NER

[论文阅读笔记27]biaffine4NER:双仿射分类器在NER的应用_第7张图片

Ablation Study – 消融研究

选择ONTONOTES数据集来研究神经网络各个模块的量化情况。

[论文阅读笔记27]biaffine4NER:双仿射分类器在NER的应用_第8张图片

总结

感觉这是一个比较粗暴的方法。题目写的时dependency parsing可是论文也没有发现用到它的内容,只是用了一下biaffine。虽然它是在句子分析用到过它,可是也不能用来作题目吧,感觉是标题党。

不管怎样,它的结果还可以,可是看了代码,它的代码效果感觉是有改进的空间的。

还有整个模型也是相对比较简单的,整篇文章就是一个r评分矩阵有用一点。

相关技术

Flat Named Entity Recognition

Nested Named Entity Recognition

参考

【1】Timothy Dozat and Christopher Manning. 2017. Deep biaffine attention for neural dependency parsing. In Proceedings of 5th International Conference on Learning Representations (ICLR).

comply:v. 服从,遵守

你可能感兴趣的:(NER,深度学习,人工智能,深度学习,机器学习)