本笔记理出来综述中的点,并将大体的论文都列出,方便日后调研使用查找,详细可以看论文。
神经网络的解释:
The forward pass com- putes a weighted sum of their inputs from the previous layer and pass the result through a non-linear function. The backward pass is to compute the gradient of an objective function with respect to the weights of a multilayer stack of modules via the chain rule of derivatives.
很简洁
优点:表示学习、可以学习到语义
The key advantage of deep learning is the capability of representation learning and the semantic composition empowered by both the vector representation and neural processing. This allows a machine to be fed with raw data and to automatically discover latent representations and processing needed for classification or detection
手工特征
- D. Nadeau and S. Sekine, “A survey of named entity recognition and classification,” Lingvist. Investig., vol. 30, no. 1, pp. 3–26, 2007.
- M. L. Patawar and M. Potey, “Approaches to named entity recognition: a survey,” Int. J. Innov. Res. Comput. Commun. Eng., vol. 3, no. 12, pp. 12 201–12 208, 2015.
domain-specific gazetteers
- O. Etzioni, M. Cafarella, D. Downey, A.-M. Popescu, T. Shaked, S. Soderland, D. S. Weld, and A. Yates, “Unsupervised named- entity extraction from the web: An experimental study,” Artif. Intell., vol. 165, no. 1, pp. 91–134, 2005.
- S. Sekine and C. Nobata, “Definition, dictionaries and tagger for extended named entity hierarchy.” in LREC, 2004, pp. 1977–1980.
syntactic-lexical patterns
- S. Zhang and N. Elhadad, “Unsupervised biomedical named en- tity recognition: Experiments with clinical and biological texts,” J. Biomed. Inform., vol. 46, no. 6, pp. 1088–1098, 2013.
biomedical domain
- D. Hanisch, K. Fundel, H.-T. Mevissen, R. Zimmer, and J. Fluck, “Prominer: rule-based protein and gene entity recognition,” BMC Bioinform., vol. 6, no. 1, p. S14, 2005.
- A. P. Quimbaya, A. S. Múnera, R. A. G. Rivera, J. C. D. Rodríguez, O. M. M. Velandia, A. A. G. Peña, and C. Labbé, “Named entity recognition over electronic health records through a combined dictionary-based approach,” Procedia Comput. Sci., vol. 100, pp. 55–61, 2016.
character-level representation
is that it naturally handles out-of-vocabulary. Thus character-based model is able to in- fer representations for unseen words and share information of morpheme-level regularities.
Hybrid Representation
- words, POS tags, chunking, and word shape features
- spelling features, context features, word embeddings, and gazetteer features.
- additional word-level features (capitalization, lexicons) and character-level features (4-dimensional vector repre- senting the type of a character: upper case, lower case, punctuation, other)
- 5-dimensional word shape vector (e.g., all capitalized, not capitalized, first-letter capitalized or contains a capital letter)
Word-level features
- G. Zhou and J. Su, “Named entity recognition using an hmm- based chunk tagger,” in ACL, 2002, pp. 473–480.
- W. Liao and S. Veeramachaneni, “A simple semi-supervised algorithm for named entity recognition,” in NAACL-HLT, 2009, pp. 58–65.
- A. Ghaddar and P. Langlais, “Robust lexical features for im- proved neural network named-entity recognition,” in COLING, 2018, pp. 1896–1907.
document and corpus features
- Y. Ravin and N. Wacholder, Extracting names from natural-language text. IBM Research Report RC 2033, 1997.
- V. Krishnan and C. D. Manning, “An effective two-stage model for exploiting non-local dependencies in named entity recogni- tion,” in ACL, 2006, pp. 1121–1128.
More features
- D. Nadeau and S. Sekine, “A survey of named entity recognition and classification,” Lingvist. Investig., vol. 30, no. 1, pp. 3–26, 2007.
- R. Sharnagat, “Named entity recognition: A literature survey,” Center For Indian Language Technology, 2014.
- D. Campos, S. Matos, and J. L. Oliveira, “Biomedical named entity recognition: a survey of machine-learning tools,” in Theory Appl. Adv. Text Min., 2012.
非监督方法
- D. Nadeau and S. Sekine, “A survey of named entity recognition and classification,” Lingvist. Investig., vol. 30, no. 1, pp. 3–26, 2007.
- O. Etzioni, M. Cafarella, D. Downey, A.-M. Popescu, T. Shaked, S. Soderland, D. S. Weld, and A. Yates, “Unsupervised named- entity extraction from the web: An experimental study,” Artif. Intell., vol. 165, no. 1, pp. 91–134, 2005.
- S. Zhang and N. Elhadad, “Unsupervised biomedical named en- tity recognition: Experiments with clinical and biological texts,” J. Biomed. Inform., vol. 46, no. 6, pp. 1088–1098, 2013.
- M. Collins and Y. Singer, “Unsupervised models for named entity classification,” in EMNLP, 1999, pp. 100–110.
- D. Nadeau, P. D. Turney, and S. Matwin, “Unsupervised named- entity recognition: Generating gazetteers and resolving ambigu- ity,” in CSCSI, 2006, pp. 266–277.
language-model-augmented
- M. E. Peters, W. Ammar, C. Bhagavatula, and R. Power, “Semi- supervised sequence tagging with bidirectional language mod- els,”
- M.E.Peters,M.Neumann,M.Iyyer,M.Gardner,C.Clark,K.Lee, and L. Zettlemoyer, “Deep contextualized word representations,”
- M. Rei, “Semi-supervised multitask learning for sequence label- ing,”
- L. Liu, X. Ren, J. Shang, J. Peng, and J. Han, “Efficient contex- tualized representation: Language model pruning for sequence labeling,”
- L. Liu, J. Shang, F. Xu, X. Ren, H. Gui, J. Peng, and J. Han, “Empower sequence labeling with task-aware neural language model,”
- C. Jia, L. Xiao, and Y. Zhang, “Cross-domain NER using cross- domain language modeling,”
ner类型:
- coarse-grained NER
- fine-grained NER tasks
- X. Ling and D. S. Weld, “Fine-grained entity recognition.” in AAAI, vol. 12, 2012, pp. 94–100.
- X. Ren, W. He, M. Qu, L. Huang, H. Ji, and J. Han, “Afet: Automatic fine-grained entity typing by hierarchical partial-label embedding,” in EMNLP, 2016, pp. 1369–1378.
- A. Abhishek, A. Anand, and A. Awekar, “Fine-grained entity type classification by jointly learning representations and label embeddings,” in EACL, 2017, pp. 797–807.
- A. Lal, A. Tomer, and C. R. Chowdary, “Sane: System for fine grained named entity typing on textual data,” in WWW, 2017, pp. 227–230.
- L. d. Corro, A. Abujabal, R. Gemulla, and G. Weikum, “Finet: Context-aware fine-grained named entity typing,” in EMNLP, 2015, pp. 868–878.
数据集
- 有些数据集会有几百个标签,如HYENA Gillick
- OntoNotes CoNLL03
https://github.com/juand-r/entity-recognition-datasets
https://github.com/cambridgeltl/MTL-Bioinformatics-2016/tree/master/data
工具
指标
ner有两个任务:boundary detection and type identification
FP: 模型返回的正例,但是ground-truth中没有
FN:模型没有返回的正例,但是ground-truth中有
TP: 模型返回的正例,但是ground-truth中有
F值分宏观和微观,宏观是所有类别的F平均值。微观是每个类别自己的F
- MUC-6 忽略边界统计分类指标;忽略类别统计边界的指标
- ACE 太复杂 一般不用
Context Encoder Architectures
被广泛使用的内容encoder架构:卷积神经网络、循环神经网络、递归神经网络、深度transformer 。
1、R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, “Natural language processing (almost) from scratch,”
cnn用来捕捉单词的局部特征
2、E. Strubell, P. Verga, D. Belanger, and A. McCallum, “Fast and accurate entity recognition with iterated dilated convolutions,”
传统并行的LSTMs长度为n的序列的时间复杂度是O(n),ID-CNNs有 更长的文本和结构预测能力。速度项目BI LSTM CRF的速度上有14-20倍提高。
3、BILSTM 因为rnn最后的单词对句子表达影响比较大 P. Zhou, S. Zheng, J. Xu, Z. Qi, H. Bao, and B. Xu, “Joint extraction of multiple relations and entities by using a hybrid neural network,”
4、GRU LSTM
5、递归神经网络 是非线性的自适应模型,能够学习到具有拓扑顺序的结构其中的深度结构化信息。
- P.-H. Li, R.-P. Dong, Y.-S. Wang, J.-C. Chou, and W.-Y. Ma, “Leveraging linguistic structures for named entity recognition with bidirectional recursive neural networks,”
- M. Rei, “Semi-supervised multitask learning for sequence label- ing,” in ACL, 2017, pp. 2121–2130.
神经语言模型
前向、逆向神经语言模型
在多任务学习中,语言模型和序列标记模型共享同一字符层。来自字符级嵌入、预先训练的单词嵌入和局
域网语言模型表示的向量被连接并送入单词级LSTMs中。实验结果表明,多任务学习是一种有效的指导语言
模型学习特定任务知识的方法。
Deep Transformer
traditional embeddings and language model embeddings联合使用
- A. Ghaddar and P. Langlais, “Robust lexical features for im- proved neural network named-entity recognition,” in COLING, 2018, pp. 1896–1907.
- Z. Jie and W. Lu, “Dependency-guided lstm-crf for named entity recognition,” in EMNLP, 2018, pp. 3860–3870.
- C. Xia, C. Zhang, T. Yang, Y. Li, N. Du, X. Wu, W. Fan, F. Ma, and P. S. Yu, “Multi-grained named entity recognition,” in ACL, 2019, pp. 1430–1440.
- Y. Luo, F. Xiao, and H. Zhao, “Hierarchical contextual- ized representation for named entity recognition,” CoRR, vol. abs/1911.02257, 2019.
- Y. Liu, F. Meng, J. Zhang, J. Xu, Y. Chen, and J. Zhou, “GCDT: A global context enhanced deep transition architecture for sequence labeling,” in ACL, 2019, pp. 2431–2441.
- Y. Jiang, C. Hu, T. Xiao, C. Zhang, and J. Zhu, “Improved differ- entiable architecture search for language modeling and named entity recognition,”
将ner变为mrc方法
- X. Li, J. Feng, Y. Meng, Q. Han, F. Wu, and J. Li, “A uni- fied MRC framework for named entity recognition,” CoRR, vol. abs/1910.11476, 2019.
- X. Li, X. Sun, Y. Meng, J. Liang, F. Wu, and J. Li, “Dice loss for data-imbalanced NLP tasks
Tag Decoder Architectures
four architectures of tag decoders:
MLP + softmax layer, conditional random fields (CRFs), recurrent neural networks, and pointer networks.
A. Akbik, D. Blythe, and R. Vollgraf, “Contextual string embeddings for sequence labeling,” in COLING, 2018, pp. 1638–1649.
指针网络首先识别一块(或段),然后标记它。重复此操作,直到处理完输入序列中的所有单词。在图12(d)中,给定开始令牌“”,段“Michael Jeffery Jordan”首先被识别,然后被标记为“PERSON”。分割和标记可以用指针网络中的两个独立的神经网络来完成。接下来,“迈克尔·杰弗瑞·乔丹”作为输入输入到指针网络中。结果,段“was”被识别并标记为“O”。
- F. Zhai, S. Potdar, B. Xiang, and B. Zhou, “Neural models for sequence chunking.” in AAAI, 2017, pp. 3365–3371.
- O. Vinyals, M. Fortunato, and N. Jaitly, “Pointer networks,” in NIPS, 2015, pp. 2692–2700.
- J. Li, A. Sun, and S. Joty, “Segbot: A generic neural text segmenta- tion model with pointer network,” in IJCAI, 2018, pp. 4166–4172
DNN 架构总结
以下结果表明,外部知识可以促进NER性能的提高。
- T. Liu, J. Yao, and C. Lin, “Towards improving neural named entity recognition with gazetteers,” in ACL, 2019, pp. 5301–5307.
- Z. Jie and W. Lu, “Dependency-guided lstm-crf for named entity recognition,” in EMNLP, 2018, pp. 3860–3870.
- C. Xia, C. Zhang, T. Yang, Y. Li, N. Du, X. Wu, W. Fan, F. Ma, and P. S. Yu, “Multi-grained named entity recognition,”
- J. Zhuo, Y. Cao, J. Zhu, B. Zhang, and Z. Nie, “Segment-level se- quence modeling using gated recursive semi-markov conditional random fields,”
缺点:
1) acquiring external knowledge is labor-intensive (e.g., gazetteers) or computationally expensive (e.g., dependency);
2) integrat- ing external knowledge adversely affects end-to-end learn- ing and hurts the generality of DL-based systems.
预训练好的transformer比lstm更有效。没有预训练并且数据有限,transformer表现会不好(
Q. Guo, X. Qiu, P. Liu, Y. Shao, X. Xue, and Z. Zhang, “Star- transformer,” in NAACL-HLT, 2019, pp. 1315–1325.
H. Yan, B. Deng, X. Li, and X. Qiu, “Tener: Adapting trans- former encoder for name entity recognition,” arXiv preprint arXiv:1911.04474, 2019.
)
transformer当序列长度n小于embedding维度d,会更快,complexities: self- attention O(n2 · d) and recurrent O(n · d2) [A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in NIPS, 2017, pp. 5998–6008.]
对于最终用户,选择什么体系结构取决于数据和领域任务。如果数据充足,可以考虑从零开始用RNNs训练模型和对上下文化语言模型进行微调。如果数据稀缺,采用迁移策略可能是更好的选择。对于新闻域,有许多预先训练的现成模型可用。对于特定领域(例如,医疗和社会媒体),使用特定领域数据微调通用目的上下文化语言模型通常是一种有效的方法。
low-resource and across- domain NER
- C. Jia, L. Xiao, and Y. Zhang, “Cross-domain NER using cross- domain language modeling,”
- S. J. Pan, Z. Toh, and J. Su, “Transfer joint embedding for cross- domain named entity recognition,” ACM Trans. Inf. Syst., vol. 31, no. 2, p. 7, 2013.
- J. Y. Lee, F. Dernoncourt, and P. Szolovits, “Transfer learning for named-entity recognition with neural networks,” arXiv preprint arXiv:1705.06273, 2017.
- B. Y. Lin and W. Lu, “Neural adaptation layers for cross-domain named entity recognition,” in EMNLP, 2018, pp. 2012–2022.
- Y. Cao, Z. Hu, T. Chua, Z. Liu, and H. Ji, “Low-resource name tagging learned with weakly labeled data,” in EMNLP, 2019, pp. 261–270.
- X. Huang, L. Dong, E. Boschee, and N. Peng, “Learning A unified named entity tagger from multiple partially annotated corpora for efficient adaptation,” in CoNLL, 2019, pp. 515–527.
通过bootstrapping集成的传统方法
- J. Jiang and C. Zhai, “Instance weighting for domain adaptation in nlp,” in ACL, 2007, pp. 264–271.
- D. Wu, W. S. Lee, N. Ye, and H. L. Chieu, “Domain adaptive bootstrapping for named entity recognition,” in EMNLP, 2009, pp. 1523–1532.
- A. Chaudhary, J. Xie, Z. Sheikh, G. Neubig, and J. G. Carbonell, “A little annotation does a lot of good: A study in bootstrapping low-resource named entity recognizers,” pp. 5163–5173, 2019.
迁移学习
如果两个任务具有可映射的标签集,则存在一个共享的CRF层,否则,每个任务学习一个单独的CRF层。实验结果表明,在低资源条件下,各种数据集都有显著的改进,提出三种迁移场景,Z. Yang, R. Salakhutdinov, and W. W. Cohen, “Transfer learning for sequence tagging with hierarchical recurrent networks,” in ICLR, 2017.
Zhao等人[H. Zhao, Y. Yang, Q. Zhang, and L. Si, “Improve neural entity recognition via multi-task data selection and constrained decod- ing,” in NAACL-HLT, vol. 2, 2018, pp. 346–351.]提出了一种具有域自适应的多任务模型,其中全连接层适用于不同的数据集,CRF特征分别计算。赵的模型的一个主要优点是在数据选择过程中过滤了具有不同分布和不正确标注指南的实例。
在原任务上训练,然后在目标任务数据上微调 J. Y. Lee, F. Dernoncourt, and P. Szolovits, “Transfer learning for named-entity recognition with neural networks,” arXiv preprint arXiv:1705.06273, 2017.
B. Y. Lin and W. Lu, “Neural adaptation layers for cross-domain named entity recognition,” in EMNLP, 2018, pp. 2012–2022.提出在三个神经网络自适应层word adapta- tion layer, sentence adaptation layer, and output adaptation layer 微调的办法
tag-hierarchy model
提出了一种用于异构标记集NER设置的标记层次模型,在推理过程中使用层次将细粒度标记映射到目标标记集。
G. Beryozkin, Y. Drori, O. Gilon, T. Hartman, and I. Szpektor, “A joint named-entity recognizer for heterogeneous tag-sets using a tag hierarchy,” in ACL, 2019, pp. 140–150.
医学上的一些迁移学习,用来减少标注数据量
- X. Wang, Y. Zhang, X. Ren, Y. Zhang, M. Zitnik, J. Shang, C. Langlotz, and J. Han, “Cross-type biomedical named en- tity recognition with deep multi-task learning,” arXiv preprint arXiv:1801.09851, 2018.
- J. M. Giorgi and G. D. Bader, “Transfer learning for biomedical named entity recognition with neural networks,” Bioinformatics, 2018.
- Z. Wang, Y. Qu, L. Chen, J. Shen, W. Zhang, S. Zhang, Y. Gao, G. Gu, K. Chen, and Y. Yu, “Label-aware double transfer learning for cross-specialty medical named entity recognition,” in NAACL- HLT, 2018, pp. 1–15.
深度主动学习的NER
Y. Shen, H. Yun, Z. C. Lipton, Y. Kronrod, and A. Anandkumar, “Deep active learning for named entity recognition,” 提出增量学习,可以在每个batch上增加新的label。
主动学习算法实现99%的性能最好的深度学习模型训练的完整的数据在英语使用只有24.9%的训练数据集和30.1%对中国数据集。此外,有12.0%和16.9%的训练数据足以使深度主动学习模型优于在全训练数据上学习的浅层模型
D. D. Lewis and W. A. Gale, “A sequential algorithm for training text classifiers,” in SIGIR, 1994, pp. 3–12.
S. Pradhan, A. Moschitti, N. Xue, H. T. Ng, A. Björkelund, O. Uryupina, Y. Zhang, and Z. Zhong, “Towards robust linguistic analysis using ontonotes,” in CoNLL, 2013, pp. 143–152.
NER的深度强化学习
强化学习论文:
L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,” J. Artif. Intell. Res., vol. 4, pp. 237–285, 1996.
R. S. Sutton and A. G. Barto, Introduction to reinforcement learning. MIT press Cambridge, 1998, vol. 135.
S. C. Hoi, D. Sahoo, J. Lu, and P. Zhao, “Online learning: A comprehensive survey,” arXiv preprint arXiv:1802.02871, 2018.
强化学习包括三个组成:1、状态转移函数;2、观察函数(如输出函数);3、激励函数
也能被建模为一个随机有限状态机,具有输入(来自环境的观察/奖励)和输出(对环境的行动)。它由两部分组成:(i)状态转移函数,(ii) 策略/输出函数。
Y. Yang, W. Chen, Z. Li, Z. He, and M. Zhang, “Distantly super- vised NER with partial annotation learning and reinforcement learning,” in COLING, 2018, pp. 2159–2169. 利用远程监督生成的数据在新的域上实现新类型的命名体体识别
NER的深度对抗学习
对抗学习:D.LowdandC.Meek,“Adversariallearning,”inSIGKDD,2005, pp. 641–647.
目的是模型更鲁棒或者对输入的干净的数据,减少测试错误。有generative network和discriminative network。
- L.Huang,H.Ji,andJ.May,“Cross-lingualmulti-leveladversarial transfer to enhance low-resource name tagging,” in NAACL-HLT, 2019, pp. 3823–3833.
- J. Li, D. Ye, and S. Shang, “Adversarial transfer for named entity boundary detection with pointer networks,” in IJCAI, 2019, pp. 5053–5059.
- P. Cao, Y. Chen, K. Liu, J. Zhao, and S. Liu, “Adversarial transfer learning for chinese named entity recognition with self-attention mechanism,” in EMNLP, 2018, pp. 182–192.
Neural Attention for NER
神经注意机制使神经网络有能力集中在它的输入的子集上。通过应用注意机制,NER模型可以捕获输入信息中信息最丰富的元素。
- M. Rei, G. K. Crichton, and S. Pyysalo, “Attending to characters in neural sequence labeling models,” in COLING, 2016, pp. 309– 318.
- A. Zukov-Gregoric, Y. Bachrach, P. Minkovsky, S. Coope, and B. Maksak, “Neural named entity recognition using a self- attention mechanism,” in ICTAI, 2017, pp. 652–656.
- G. Xu, C. Wang, and X. He, “Improving clinical named entity recognition with global neural attention,” in APWeb-WAIM, 2018, pp. 264–279.
- Q. Zhang, J. Fu, X. Liu, and X. Huang, “Adaptive co-attention network for named entity recognition in tweets,” in AAAI, 2018.