Named Entity Recognition as Dependency Parsing
Yu, J., Bohnet, B., & Poesio, M. (2020). Named Entity Recognition as Dependency Parsing. ArXiv, abs/2005.07150.
代码:https://github.com/juntaoy/biaffine-ner
Juntao Yu
Queen Mary University London, UK 伦敦玛丽女王大学
Bernd Bohnet
Google Research Netherlands
Massimo Poesio
Queen Mary University London, UK
也是为了解决nested NER而提出的模型;
基本思想:以基于图的依赖解析构建一个全局图作为输入,经过biaffine模型进行处理;具体地,biaffine模型主要是对句子中的tokens的start与end对(即所有的spans)进行评分,最后达到实体抽取的目的。
主要思想:把实体抽取任务看成为识别start与end索引的问题,同时对这个start与end形成的span赋予类型。采用biaffine模型在多层神经网络的顶端输出所有spans的分数,然后根据这些分数对候选spans进行排序,返回top-rank spans,这些数据满足flat or nested Ner的限制条件的。网络结构如下图:
input
级联word embeddings和character embeddings;
词编码:BERT_Large,fastText
输入入的数据集处理成的样子:
{"doc_key": "batch_01",
"ners": [[[0, 0, "PER"], [3, 3, "GPE"], [5, 5, "GPE"]],
[[3, 3, "PER"], [10, 14, "ORG"], [20, 20, "GPE"], [20, 25, "GPE"], [22, 22, "GPE"]],
[]],
"sentences": [["Anwar", "arrived", "in", "Shanghai", "from", "Nanjing", "yesterday", "afternoon", "."],
["This", "morning", ",", "Anwar", "attended", "the", "foundation", "laying", "ceremony", "of", "the", "Minhang", "China-Malaysia", "joint-venture", "enterprise", ",", "and", "after", "that", "toured", "Pudong", "'s", "Jingqiao", "export", "processing", "district", "."],
["(", "End", ")"]]}
数据主要分为三种信息:doc_key, ners, sentences; 文档id, 实体,句子。实体采用start,end,类型三元组来表达,句子采用字符数组来表达。
然后看下模型实际要的数据格式:
然后看一下这个文本是怎么组装的:
def tensorize_example(self, example, is_training):
ners = example["ners"]
sentences = example["sentences"]
max_sentence_length = max(len(s) for s in sentences)
max_word_length = max(max(max(len(w) for w in s) for s in sentences), max(self.config["filter_widths"]))
text_len = np.array([len(s) for s in sentences])
tokens = [[""] * max_sentence_length for _ in sentences]
char_index = np.zeros([len(sentences), max_sentence_length, max_word_length])
context_word_emb = np.zeros([len(sentences), max_sentence_length, self.context_embeddings_size])
lemmas = []
if "lemmas" in example:
lemmas = example["lemmas"]
for i, sentence in enumerate(sentences):
for j, word in enumerate(sentence):
tokens[i][j] = word
if self.context_embeddings.is_in_embeddings(word):
context_word_emb[i, j] = self.context_embeddings[word]
elif lemmas and self.context_embeddings.is_in_embeddings(lemmas[i][j]):
context_word_emb[i,j] = self.context_embeddings[lemmas[i][j]]
char_index[i, j, :len(word)] = [self.char_dict[c] for c in word]
tokens = np.array(tokens)
doc_key = example["doc_key"]
lm_emb = self.load_lm_embeddings(doc_key)
gold_labels = []
if is_training:
for sid, sent in enumerate(sentences):
ner = {(s,e):self.ner_maps[t] for s,e,t in ners[sid]}
for s in xrange(len(sent)):
for e in xrange(s,len(sent)):
gold_labels.append(ner.get((s,e),0))
gold_labels = np.array(gold_labels)
example_tensors = (tokens, context_word_emb,lm_emb, char_index, text_len, is_training, gold_labels)
return example_tensors
这里可以看到一个句子长*句子长的字符下标矩阵char_index,这个只要是对char进行embedding的,它的处理模型采用CNN。
再看一下输入的级联所包含的信息:
context_emb_list = []
context_emb_list.append(context_word_emb)
context_emb_list.append(aggregated_char_emb)
context_emb_list.append(aggregated_lm_emb)
context_emb = tf.concat(context_emb_list, 2)
context_emb = tf.nn.dropout(context_emb, self.lexical_dropout)
隐含层:BiLSTM
FFNNs
这里对BiLSTM分别用了两个前馈全连接网络。
其中projection函数为:
def projection(inputs, output_size, initializer=None):
return ffnn(inputs, 0, -1, output_size, dropout=None, output_weights_initializer=initializer)
从代码就可以看到BiLSTM分别输入两个FFNN。
分类器:biaffine classififier.
关于biaffine分类器:
r矩阵的大小为:l*l*c.l表示句子长度;c是NER类别+1;r矩阵提供了所有spans的可能。
然后对于瓢标签的进行根据分数进行排序。
从代码中看:
candidate_ner_scores = util.bilinear_classifier(candidate_starts_emb,candidate_ends_emb,self.dropout,output_size=self.num_types+1)#[num_sentence, max_sentence_length,max_sentence_length,types+1]
candidate_ner_scores = tf.boolean_mask(tf.reshape(candidate_ner_scores,[-1,self.num_types+1]),flattened_candidate_scores_mask)
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=gold_labels, logits=candidate_ner_scores)
loss = tf.reduce_sum(loss)
把start与end一起输入到双曲线分类器了,最后采用交叉熵来计算损失函数。
数据集:
对于nested ner: ACE 2004, ACE 2005, GENIA;
对于flat ner: CONLL 2002,CONLL 2003 ,ONTONOTES;
选择ONTONOTES数据集来研究神经网络各个模块的量化情况。
感觉这是一个比较粗暴的方法。题目写的时dependency parsing可是论文也没有发现用到它的内容,只是用了一下biaffine。虽然它是在句子分析用到过它,可是也不能用来作题目吧,感觉是标题党。
不管怎样,它的结果还可以,可是看了代码,它的代码效果感觉是有改进的空间的。
还有整个模型也是相对比较简单的,整篇文章就是一个r评分矩阵有用一点。
Flat Named Entity Recognition
Nested Named Entity Recognition
【1】Timothy Dozat and Christopher Manning. 2017. Deep biaffine attention for neural dependency parsing. In Proceedings of 5th International Conference on Learning Representations (ICLR).
comply:v. 服从,遵守