【论文笔记】A Survey on Deep Learning for Named Entity Recognition


                                     【论文笔记】A Survey on Deep Learning for Named Entity Recognition_第1张图片



The forward pass com- putes a weighted sum of their inputs from the previous layer and pass the result through a non-linear function. The backward pass is to compute the gradient of an objective function with respect to the weights of a multilayer stack of modules via the chain rule of derivatives.




The key advantage of deep learning is the capability of representation learning and the semantic composition empowered by both the vector representation and neural processing. This allows a machine to be fed with raw data and to automatically discover latent representations and processing needed for classification or detection




  1. D. Nadeau and S. Sekine, “A survey of named entity recognition and classification,” Lingvist. Investig., vol. 30, no. 1, pp. 3–26, 2007.
  2. M. L. Patawar and M. Potey, “Approaches to named entity recognition: a survey,” Int. J. Innov. Res. Comput. Commun. Eng., vol. 3, no. 12, pp. 12 201–12 208, 2015.


domain-specific gazetteers

  1. O. Etzioni, M. Cafarella, D. Downey, A.-M. Popescu, T. Shaked, S. Soderland, D. S. Weld, and A. Yates, “Unsupervised named- entity extraction from the web: An experimental study,” Artif. Intell., vol. 165, no. 1, pp. 91–134, 2005.
  2. S. Sekine and C. Nobata, “Definition, dictionaries and tagger for extended named entity hierarchy.” in LREC, 2004, pp. 1977–1980.

syntactic-lexical patterns

  1. S. Zhang and N. Elhadad, “Unsupervised biomedical named en- tity recognition: Experiments with clinical and biological texts,” J. Biomed. Inform., vol. 46, no. 6, pp. 1088–1098, 2013.

biomedical domain

  1. D. Hanisch, K. Fundel, H.-T. Mevissen, R. Zimmer, and J. Fluck, “Prominer: rule-based protein and gene entity recognition,” BMC Bioinform., vol. 6, no. 1, p. S14, 2005.
  2. A. P. Quimbaya, A. S. Múnera, R. A. G. Rivera, J. C. D. Rodríguez, O. M. M. Velandia, A. A. G. Peña, and C. Labbé, “Named entity recognition over electronic health records through a combined dictionary-based approach,” Procedia Comput. Sci., vol. 100, pp. 55–61, 2016.




character-level representation

is that it naturally handles out-of-vocabulary. Thus character-based model is able to in- fer representations for unseen words and share information of morpheme-level regularities.

【论文笔记】A Survey on Deep Learning for Named Entity Recognition_第2张图片


Hybrid Representation

  1. words, POS tags, chunking, and word shape features
  2. spelling features, context features, word embeddings, and gazetteer features.
  3. additional word-level features (capitalization, lexicons) and character-level features (4-dimensional vector repre- senting the type of a character: upper case, lower case, punctuation, other)
  4. 5-dimensional word shape vector (e.g., all capitalized, not capitalized, first-letter capitalized or contains a capital letter)



Word-level features

  1. G. Zhou and J. Su, “Named entity recognition using an hmm- based chunk tagger,” in ACL, 2002, pp. 473–480.
  2. W. Liao and S. Veeramachaneni, “A simple semi-supervised algorithm for named entity recognition,” in NAACL-HLT, 2009, pp. 58–65.
  3. A. Ghaddar and P. Langlais, “Robust lexical features for im- proved neural network named-entity recognition,” in COLING, 2018, pp. 1896–1907.


document and corpus features

  1. Y. Ravin and N. Wacholder, Extracting names from natural-language text. IBM Research Report RC 2033, 1997.
  2. V. Krishnan and C. D. Manning, “An effective two-stage model for exploiting non-local dependencies in named entity recogni- tion,” in ACL, 2006, pp. 1121–1128.


More features

  1. D. Nadeau and S. Sekine, “A survey of named entity recognition and classification,” Lingvist. Investig., vol. 30, no. 1, pp. 3–26, 2007.
  2. R. Sharnagat, “Named entity recognition: A literature survey,” Center For Indian Language Technology, 2014.
  3. D. Campos, S. Matos, and J. L. Oliveira, “Biomedical named entity recognition: a survey of machine-learning tools,” in Theory Appl. Adv. Text Min., 2012.




  1. D. Nadeau and S. Sekine, “A survey of named entity recognition and classification,” Lingvist. Investig., vol. 30, no. 1, pp. 3–26, 2007.
  2. O. Etzioni, M. Cafarella, D. Downey, A.-M. Popescu, T. Shaked, S. Soderland, D. S. Weld, and A. Yates, “Unsupervised named- entity extraction from the web: An experimental study,” Artif. Intell., vol. 165, no. 1, pp. 91–134, 2005.
  3. S. Zhang and N. Elhadad, “Unsupervised biomedical named en- tity recognition: Experiments with clinical and biological texts,”  J. Biomed. Inform., vol. 46, no. 6, pp. 1088–1098, 2013.
  4. M. Collins and Y. Singer, “Unsupervised models for named entity classification,” in EMNLP, 1999, pp. 100–110.
  5. D. Nadeau, P. D. Turney, and S. Matwin, “Unsupervised named- entity recognition: Generating gazetteers and resolving ambigu- ity,” in CSCSI, 2006, pp. 266–277.




  1. M. E. Peters, W. Ammar, C. Bhagavatula, and R. Power, “Semi- supervised sequence tagging with bidirectional language mod- els,”
  2. M.E.Peters,M.Neumann,M.Iyyer,M.Gardner,C.Clark,K.Lee, and L. Zettlemoyer, “Deep contextualized word representations,”
  3. M. Rei, “Semi-supervised multitask learning for sequence label- ing,”
  4. L. Liu, X. Ren, J. Shang, J. Peng, and J. Han, “Efficient contex- tualized representation: Language model pruning for sequence labeling,”
  5. L. Liu, J. Shang, F. Xu, X. Ren, H. Gui, J. Peng, and J. Han, “Empower sequence labeling with task-aware neural language model,”
  6. C. Jia, L. Xiao, and Y. Zhang, “Cross-domain NER using cross- domain language modeling,”



  1. coarse-grained NER
  2. fine-grained NER tasks
    1. X. Ling and D. S. Weld, “Fine-grained entity recognition.” in AAAI, vol. 12, 2012, pp. 94–100.
    2. X. Ren, W. He, M. Qu, L. Huang, H. Ji, and J. Han, “Afet: Automatic fine-grained entity typing by hierarchical partial-label embedding,” in EMNLP, 2016, pp. 1369–1378.
    3. A. Abhishek, A. Anand, and A. Awekar, “Fine-grained entity type classification by jointly learning representations and label embeddings,” in EACL, 2017, pp. 797–807.
    4. A. Lal, A. Tomer, and C. R. Chowdary, “Sane: System for fine grained named entity typing on textual data,” in WWW, 2017, pp. 227–230.
    5. L. d. Corro, A. Abujabal, R. Gemulla, and G. Weikum, “Finet: Context-aware fine-grained named entity typing,” in EMNLP, 2015, pp. 868–878.



  1. 有些数据集会有几百个标签,如HYENA  Gillick
  1. OntoNotes CoNLL03


【论文笔记】A Survey on Deep Learning for Named Entity Recognition_第3张图片






【论文笔记】A Survey on Deep Learning for Named Entity Recognition_第4张图片



ner有两个任务:boundary detection and type identification


FP: 模型返回的正例,但是ground-truth中没有


TP: 模型返回的正例,但是ground-truth中有


【论文笔记】A Survey on Deep Learning for Named Entity Recognition_第5张图片



  1. MUC-6 忽略边界统计分类指标;忽略类别统计边界的指标
  2. ACE 太复杂 一般不用


Context Encoder Architectures

被广泛使用的内容encoder架构:卷积神经网络、循环神经网络、递归神经网络、深度transformer 。

1、R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, “Natural language processing (almost) from scratch,”


【论文笔记】A Survey on Deep Learning for Named Entity Recognition_第6张图片


2、E. Strubell, P. Verga, D. Belanger, and A. McCallum, “Fast and accurate entity recognition with iterated dilated convolutions,”

【论文笔记】A Survey on Deep Learning for Named Entity Recognition_第7张图片

传统并行的LSTMs长度为n的序列的时间复杂度是O(n),ID-CNNs有 更长的文本和结构预测能力。速度项目BI LSTM CRF的速度上有14-20倍提高。


3、BILSTM 因为rnn最后的单词对句子表达影响比较大 P. Zhou, S. Zheng, J. Xu, Z. Qi, H. Bao, and B. Xu, “Joint extraction of multiple relations and entities by using a hybrid neural network,”




5、递归神经网络 是非线性的自适应模型,能够学习到具有拓扑顺序的结构其中的深度结构化信息。

  • P.-H. Li, R.-P. Dong, Y.-S. Wang, J.-C. Chou, and W.-Y. Ma, “Leveraging linguistic structures for named entity recognition with bidirectional recursive neural networks,”
  • M. Rei, “Semi-supervised multitask learning for sequence label- ing,” in ACL, 2017, pp. 2121–2130.








【论文笔记】A Survey on Deep Learning for Named Entity Recognition_第8张图片


Deep Transformer

【论文笔记】A Survey on Deep Learning for Named Entity Recognition_第9张图片


traditional embeddings and language model embeddings联合使用

  1. A. Ghaddar and P. Langlais, “Robust lexical features for im- proved neural network named-entity recognition,” in COLING, 2018, pp. 1896–1907.
  2. Z. Jie and W. Lu, “Dependency-guided lstm-crf for named entity recognition,” in EMNLP, 2018, pp. 3860–3870.
  3. C. Xia, C. Zhang, T. Yang, Y. Li, N. Du, X. Wu, W. Fan, F. Ma, and P. S. Yu, “Multi-grained named entity recognition,” in ACL, 2019, pp. 1430–1440.
  4. Y. Luo, F. Xiao, and H. Zhao, “Hierarchical contextual- ized representation for named entity recognition,” CoRR, vol. abs/1911.02257, 2019.
  5. Y. Liu, F. Meng, J. Zhang, J. Xu, Y. Chen, and J. Zhou, “GCDT: A global context enhanced deep transition architecture for sequence labeling,” in ACL, 2019, pp. 2431–2441.
  6. Y. Jiang, C. Hu, T. Xiao, C. Zhang, and J. Zhu, “Improved differ- entiable architecture search for language modeling and named entity recognition,”



  1. X. Li, J. Feng, Y. Meng, Q. Han, F. Wu, and J. Li, “A uni- fied MRC framework for named entity recognition,” CoRR, vol. abs/1910.11476, 2019.
  2. X. Li, X. Sun, Y. Meng, J. Liang, F. Wu, and J. Li, “Dice loss for data-imbalanced NLP tasks



Tag Decoder Architectures

four architectures of tag decoders:

MLP + softmax layer, conditional random fields (CRFs), recurrent neural networks, and pointer networks.


【论文笔记】A Survey on Deep Learning for Named Entity Recognition_第10张图片




A. Akbik, D. Blythe, and R. Vollgraf, “Contextual string embeddings for sequence labeling,” in COLING, 2018, pp. 1638–1649.


指针网络首先识别一块(或段),然后标记它。重复此操作,直到处理完输入序列中的所有单词。在图12(d)中,给定开始令牌“”,段“Michael Jeffery Jordan”首先被识别,然后被标记为“PERSON”。分割和标记可以用指针网络中的两个独立的神经网络来完成。接下来,“迈克尔·杰弗瑞·乔丹”作为输入输入到指针网络中。结果,段“was”被识别并标记为“O”。

  1. F. Zhai, S. Potdar, B. Xiang, and B. Zhou, “Neural models for sequence chunking.” in AAAI, 2017, pp. 3365–3371.
  2. O. Vinyals, M. Fortunato, and N. Jaitly, “Pointer networks,” in NIPS, 2015, pp. 2692–2700.
  3. J. Li, A. Sun, and S. Joty, “Segbot: A generic neural text segmenta- tion model with pointer network,” in IJCAI, 2018, pp. 4166–4172



DNN 架构总结


  1. T. Liu, J. Yao, and C. Lin, “Towards improving neural named entity recognition with gazetteers,” in ACL, 2019, pp. 5301–5307.
  2. Z. Jie and W. Lu, “Dependency-guided lstm-crf for named entity recognition,” in EMNLP, 2018, pp. 3860–3870.
  3. C. Xia, C. Zhang, T. Yang, Y. Li, N. Du, X. Wu, W. Fan, F. Ma, and P. S. Yu, “Multi-grained named entity recognition,”
  4. J. Zhuo, Y. Cao, J. Zhu, B. Zhang, and Z. Nie, “Segment-level se- quence modeling using gated recursive semi-markov conditional random fields,”




1) acquiring external knowledge is labor-intensive (e.g., gazetteers) or computationally expensive (e.g., dependency);

2) integrat- ing external knowledge adversely affects end-to-end learn- ing and hurts the generality of DL-based systems.



Q. Guo, X. Qiu, P. Liu, Y. Shao, X. Xue, and Z. Zhang, “Star- transformer,” in NAACL-HLT, 2019, pp. 1315–1325.

H. Yan, B. Deng, X. Li, and X. Qiu, “Tener: Adapting trans- former encoder for name entity recognition,” arXiv preprint arXiv:1911.04474, 2019.


transformer当序列长度n小于embedding维度d,会更快,complexities: self- attention O(n2 · d) and recurrent O(n · d2)  [A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in NIPS, 2017, pp. 5998–6008.]




low-resource and across- domain NER

  1. C. Jia, L. Xiao, and Y. Zhang, “Cross-domain NER using cross- domain language modeling,”
  2. S. J. Pan, Z. Toh, and J. Su, “Transfer joint embedding for cross- domain named entity recognition,” ACM Trans. Inf. Syst., vol. 31, no. 2, p. 7, 2013.
  3. J. Y. Lee, F. Dernoncourt, and P. Szolovits, “Transfer learning for named-entity recognition with neural networks,” arXiv preprint arXiv:1705.06273, 2017.
  4. B. Y. Lin and W. Lu, “Neural adaptation layers for cross-domain named entity recognition,” in EMNLP, 2018, pp. 2012–2022.
  5. Y. Cao, Z. Hu, T. Chua, Z. Liu, and H. Ji, “Low-resource name tagging learned with weakly labeled data,” in EMNLP, 2019, pp. 261–270.
  6. X. Huang, L. Dong, E. Boschee, and N. Peng, “Learning A unified named entity tagger from multiple partially annotated corpora for efficient adaptation,” in CoNLL, 2019, pp. 515–527.




  1. J. Jiang and C. Zhai, “Instance weighting for domain adaptation in nlp,” in ACL, 2007, pp. 264–271.
  2. D. Wu, W. S. Lee, N. Ye, and H. L. Chieu, “Domain adaptive bootstrapping for named entity recognition,” in EMNLP, 2009, pp. 1523–1532.
  3. A. Chaudhary, J. Xie, Z. Sheikh, G. Neubig, and J. G. Carbonell, “A little annotation does a lot of good: A study in bootstrapping low-resource named entity recognizers,” pp. 5163–5173, 2019.




如果两个任务具有可映射的标签集,则存在一个共享的CRF层,否则,每个任务学习一个单独的CRF层。实验结果表明,在低资源条件下,各种数据集都有显著的改进,提出三种迁移场景,Z. Yang, R. Salakhutdinov, and W. W. Cohen, “Transfer learning for sequence tagging with hierarchical recurrent networks,” in ICLR, 2017.


Zhao等人[H. Zhao, Y. Yang, Q. Zhang, and L. Si, “Improve neural entity recognition via multi-task data selection and constrained decod- ing,” in NAACL-HLT, vol. 2, 2018, pp. 346–351.]提出了一种具有域自适应的多任务模型,其中全连接层适用于不同的数据集,CRF特征分别计算。赵的模型的一个主要优点是在数据选择过程中过滤了具有不同分布和不正确标注指南的实例。


在原任务上训练,然后在目标任务数据上微调 J. Y. Lee, F. Dernoncourt, and P. Szolovits, “Transfer learning for named-entity recognition with neural networks,” arXiv preprint arXiv:1705.06273, 2017.


B. Y. Lin and W. Lu, “Neural adaptation layers for cross-domain named entity recognition,” in EMNLP, 2018, pp. 2012–2022.提出在三个神经网络自适应层word adapta- tion layer, sentence adaptation layer, and output adaptation layer 微调的办法


tag-hierarchy model


G. Beryozkin, Y. Drori, O. Gilon, T. Hartman, and I. Szpektor, “A joint named-entity recognizer for heterogeneous tag-sets using a tag hierarchy,” in ACL, 2019, pp. 140–150.




  1. X. Wang, Y. Zhang, X. Ren, Y. Zhang, M. Zitnik, J. Shang, C. Langlotz, and J. Han, “Cross-type biomedical named en- tity recognition with deep multi-task learning,” arXiv preprint arXiv:1801.09851, 2018.
  2. J. M. Giorgi and G. D. Bader, “Transfer learning for biomedical named entity recognition with neural networks,” Bioinformatics, 2018.
  3. Z. Wang, Y. Qu, L. Chen, J. Shen, W. Zhang, S. Zhang, Y. Gao, G. Gu, K. Chen, and Y. Yu, “Label-aware double transfer learning for cross-specialty medical named entity recognition,” in NAACL- HLT, 2018, pp. 1–15.






Y. Shen, H. Yun, Z. C. Lipton, Y. Kronrod, and A. Anandkumar, “Deep active learning for named entity recognition,” 提出增量学习,可以在每个batch上增加新的label。



D. D. Lewis and W. A. Gale, “A sequential algorithm for training text classifiers,” in SIGIR, 1994, pp. 3–12.

S. Pradhan, A. Moschitti, N. Xue, H. T. Ng, A. Björkelund, O. Uryupina, Y. Zhang, and Z. Zhong, “Towards robust linguistic analysis using ontonotes,” in CoNLL, 2013, pp. 143–152.




L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,” J. Artif. Intell. Res., vol. 4, pp. 237–285, 1996.

 R. S. Sutton and A. G. Barto, Introduction to reinforcement learning. MIT press Cambridge, 1998, vol. 135.

S. C. Hoi, D. Sahoo, J. Lu, and P. Zhao, “Online learning: A comprehensive survey,” arXiv preprint arXiv:1802.02871, 2018.



也能被建模为一个随机有限状态机,具有输入(来自环境的观察/奖励)和输出(对环境的行动)。它由两部分组成:(i)状态转移函数,(ii) 策略/输出函数。


Y. Yang, W. Chen, Z. Li, Z. He, and M. Zhang, “Distantly super- vised NER with partial annotation learning and reinforcement learning,” in COLING, 2018, pp. 2159–2169. 利用远程监督生成的数据在新的域上实现新类型的命名体体识别




对抗学习:D.LowdandC.Meek,“Adversariallearning,”inSIGKDD,2005, pp. 641–647.

目的是模型更鲁棒或者对输入的干净的数据,减少测试错误。有generative network和discriminative network。


  1. L.Huang,H.Ji,andJ.May,“Cross-lingualmulti-leveladversarial transfer to enhance low-resource name tagging,” in NAACL-HLT, 2019, pp. 3823–3833.
  2. J. Li, D. Ye, and S. Shang, “Adversarial transfer for named entity boundary detection with pointer networks,” in IJCAI, 2019, pp. 5053–5059.
  3. P. Cao, Y. Chen, K. Liu, J. Zhao, and S. Liu, “Adversarial transfer learning for chinese named entity recognition with self-attention mechanism,” in EMNLP, 2018, pp. 182–192.



Neural Attention for NER



  1. M. Rei, G. K. Crichton, and S. Pyysalo, “Attending to characters in neural sequence labeling models,” in COLING, 2016, pp. 309– 318.
  2. A. Zukov-Gregoric, Y. Bachrach, P. Minkovsky, S. Coope, and B. Maksak, “Neural named entity recognition using a self- attention mechanism,” in ICTAI, 2017, pp. 652–656.
  3. G. Xu, C. Wang, and X. He, “Improving clinical named entity recognition with global neural attention,” in APWeb-WAIM, 2018, pp. 264–279.
  4. Q. Zhang, J. Fu, X. Liu, and X. Huang, “Adaptive co-attention network for named entity recognition in tweets,” in AAAI, 2018.
