题目 Neural Architectures for Nested NER through Linearization【基于线性化方法的嵌套NER神经结构】
摘要 提出了两种结构与一个BILOU sckema.第一种结构,基于LSTM+CRF标准模型,把所有标签经过笛卡尔乘积的形式组合成多标签任务;第二种结构,把Nested NER任务看作为seq2seq任务来处理,由token组成输入,输出为标注系列。另外,加入预训练模型(ELMO,BERT,Flair),可获得更好的提升。
第一种方法的优点:简单,因为现在有比较成熟的的LSTM+CRF模型的流程;缺点:比较大的实体分类。
相关工作
2009年,Finkel and Manning等人, model nested structure as syntactic constituency tree;
2018年,Ju et al. run a stacked LSTM-CRF NE recognizer as long as at least one nested entity is
predicted, from innermost to outermost entities.
2018年,Wang and Lu,build a hypergraph to cap ture all possible entity mentions in a sentence.
2018年,Katiyar and Cardie,model nested entities as a directed hypergraph similar to Lu and Roth (2015), using RNNs to model the edge probabilities.
2017年,Liu and Zhang, sequence-to-sequence architecture.
数据集
第一组(nested ne 语料):English ACE-2004,English ACE-2005,English GENIA,Czech CNEC – Czech Named Entity Corpus 1.0
注,ACE-2004,ACE-2005不是免费的。
第二组(flat NER 语料):CoNLL-2003 English and German and CoNLL-2002 Dutch and Spanish
方法
标注编码,BILOU encoding,B-(beginning), I-(inside), U-(unit-length entity), L-(last) or O (outside) labels (Ratinov and Roth, 2009).
tokens映射到多标签,按两个规则:
(1) entity mentions starting earlier have priority over entities starting later (先标注的优先于后标注)
(2) for mentions with the same beginning, longer entity mentions have priority over shorter ones(长的实体优于短的实体标注)
多个标签的联接形式为,从高优先级到低优先级的顺序连接起来。
模型
LSTM-CRF:编码器采用双向LSTM,解码采用CRF,采用多标签分类任务。
Sequence-to-sequence (seq2seq):编码器采用双向LSTM,解码器是LSTM。tokens被看成为输入序列,一个接一接去解码相关的标签,直到结束。用hard attention来关联了词与label的关系,预测标签从高优先级到低优先级来进行。
实验
优化器:the lazy variant of the Adam optimizer(Kingma and Ba, 2014), β1 =0.9 ,β2 = 0.98
mini-batches size: 8
正则化: dropout rate = 0.5 , the word dropout replaces 20% of words.
词向量(词级与字符级嵌套):
预训练词嵌套,word2vec 300维(英文),FastText(其它语言)
点到点的词嵌套,embed the input forms and lemmas (256 dimensions) and POS tags (one-hot).
字符级的嵌入,
预训练
ELMo,BERT,Flair
代码细节阅读
运行模型命令
./tagger.py --corpus=CoNLL_en --train_data=conll_en/train_dev_bilou.conll --test_data=conll_en/test_bilou.conll --decoding=seq2seq --epochs=10:1e-3,8:1e-4 --form_wes_model=word_embeddings/conll_en_form.txt --lemma_wes_model=word_embeddings/conll_en_lemma.txt --bert_embeddings_train=bert_embeddings/conll_en_train_dev_bert_large_embeddings.txt --bert_embeddings_test=bert_embeddings/conll_en_test_bert_large_embeddings.txt --flair_train=flair_embeddings/conll_en_train_dev.txt --flair_test=flair_embeddings/conll_en_test.txt --elmo_train=elmo_embeddings/conll_en_train_dev.txt --elmo_test=elmo_embeddings/conll_en_test.txt --name=seq2seq+ELMo+BERT+Flair
参考
[1] Straková, Jana, Milan Straka, and Jan Hajič. “Neural architectures for nested NER through linearization.” arXiv preprint arXiv:1908.06926 (2019).
[2] github:https://github.com/ufal/acl2019_nested_ner
[happyprince] https://blog.csdn.net/ld326/article/details/108144415