实体和关系提取是一个结合检测实体提及和从非结构化文本识别实体的语义关系的任务。我们提出了一种混合神经网络模型来提取实体及其关系,而不需要任何手工制作的特征。混合神经网络包含用于实体提取的新型双向编码器 - 解码器L STM模块(BiL STM-ED)和用于关系分类的CNN模块。在BiLSTM-ED中获得的实体的上下文信息关键词:进一步通过CNN模块以改进关系分类。我们在公共数据集ACE05(自动内容提取程序)上进行实验神经网络,以验证我们的信息提取方法的有效性。我们提出的方法实现了实体和关系提取标记分类任务的最新结果。
实体和关系提取是检测实体提及并从文本中识别它们的语义关系。它是知识提取中的一个重要问题,在知识库的自动构建中起着至关重要的作用。传统系统将此任务视为两个独立任务的管道,即命名实体识别(NER)[1]和关系分类(RC)[2]。这个分离的框架使任务易于处理,每个组件都可以更灵活。但它很少关注两个子任务的相关性。联合学习框架是一种有效的方法来关联NER和RC,这也可以避免错误的级联[3]。然而,大多数现有的联合方法是基于特征的结构化系统[3-7]。它们需要复杂的特征工程,并且严重依赖于受监督的NLP工具包,这也可能导致错误传播。为了减少特征提取中的手工工作,最近,Miwa和Bansal [8]提出了一种基于神经网络的端到端实体和关系提取方法。然而,当检测到实体时,它们使用NN结构来预测实体标签,这忽略了标签之间的长关系。基于上述分析,我们提出了一种混合神经网络模型来解决这些问题,
本文使用的方法基于神经网络模型:卷积神经网络(CNN)和长短期记忆(LSTM)。CNN最初是为计算机视觉而发明的[38],它总是被用来提取图像的特征[39,40]。近年来,CNN已成功应用于不同的NLP任务,并且还显示了提取感知语义和关键词信息的有效性[27,41-43]。长短期记忆(LSTM)模型是一种特定的复发性神经网络(RNN)。LSTM用带有门的内存块替换了一个重复神经网络的隐藏向量。它可以通过训练适当的门控权重来保持长期记忆[44,45]。LSTM还在许多NLP任务上展示了强大的能力,如机器翻译[46],句子表示[47]和关系提取[26]。在本文中,我们提出了一种基于联合学习实体及其关系的混合神经网络。与手工制作的基于特征的方法相比,它可以从给定的句子中学习相关的特征而无需复杂的特征工程工作。当与其他基于神经网络的方法[8]进行比较时,我们的方法考虑了实体标签之间的长距离关系。
混合神经网络的框架如图1所示。混合神经网络的第一层是双向LSTM编码层,由命名实体识别(NER)模块和关系分类(RC)模块共享。在编码层之后有两个“通道”,一个链接到NER模块,它是LSTM解码层,另一个链接到CNN层以提取关系。在以下部分中,我们将详细描述这些组件。
我们首先训练NER模块识别实体并获得实体的编码信息,然后进一步训练RC模块根据编码信息和实体组合对关系进行分类。特别地,我们发现如果两个实体之间存在关系,则两个实体的距离总是小于约20个字,如图4所示。因此,在确定两个实体之间的关系时,我们也充分利用了这个属性,即如果两个实体的距离大于L max,我们认为它们之间不存在关系。基于图4的统计结果,ACE05数据集中的L max约为20。
NER模块包含双向LSTM编码层和LSTM解码层。我们使用BiLSTM-ED来表示NER模块的结构。为了进一步说明BiLSTM-ED对实体提取任务的有效性,我们将BiLSTM-ED与其不同的变异和其他有效的序列标记模型进行了比较。对比方法是:
在关系分类模块中,我们使用两种信息:实体之间的子句和从双向LSTM层获得的实体的编码信息。为了说明我们考虑过的这些信息的有效性,
从图4中,我们知道当水平轴是两个实体之间的距离时,数据分布显示长尾属性。因此,我们设置阈值L max来过滤数据。如果两个实体的距离大于L max,我们认为这两个实体没有任何关系。为了分析阈值L max的影响,我们使用Sub-CNN来基于不同的L max值来预测实体关系。效果如图5所示.L max越小,过滤的数据越多。因此,如果L max太小,它可能会过滤正确的数据并使F 1结果下降。如果L max太大,则无法过滤噪声数据,这也可能损害最终结果。图5显示当L max在10和25之间时,它可以表现良好。该范围也与图4的统计结果相匹配。
实体和关系抽取是知识提取中的一个重要问题,在知识库的自动构建中起着至关重要的作用。在本文中,我们提出了一种混合神经网络模型来提取实体及其语义关系,而不需要任何手工制作的特征。当与其他基于神经网络的方法进行比较时,我们的方法考虑了实体标签之间的长距离关系。为了说明我们的方法的有效性,我们在公共数据集ACE05(自动内容提取程序)上进行了实验。公共数据集ACE05的实验结果验证了我们方法的有效性。在未来,我们将探索如何基于神经网络更好地链接这两个模块,以便它可以更好地执行。此外,我们还需要解决忽视某些关系的问题,并试图提升召回价值。
[1] D. Nadeau , S. Sekine , A survey of named entity recognition and classification, Lingvisticae Investigationes 30 (1) (2007) 3–26 .
[2] B. Rink , Utd: classifying semantic relations by combining lexical and semantic resources, in: Proceedings of the 5th International Workshop on Semantic Evaluation, 2010, pp. 256–259 .
[3] Q. Li , H. Ji , Incremental joint extraction of entity mentions and relations., in: Proceedings of the 52rd Annual Meeting of the Association for Computational Linguistics, 2014, pp. 402–412 .
[4] M. Miwa , Y. Sasaki , Modeling joint entity and relation extraction with table representation., in: Proceedings of Conference on Empirical Methods in Natural Language Processing, 2014, pp. 1858–1869 .
[5] Y.S. Chan , D. Roth , Exploiting syntactico-semantic structures for relation extraction, in: Proceedings of the 49rd Annual Meeting of the Association for Computational Linguistics, 2011, pp. 551–560 .
[6] X. Yu , W. Lam , Jointly identifying entities and extracting relations in encyclopedia text via a graphical model approach, in: Proceedings of the 21th COLING International Conference, 2010, pp. 1399–1407 .
[7] L. Li , J. Zhang , L. Jin , R. Guo , D. Huang , A distributed meta-learning system for chinese entity relation extraction, Neurocomputing 149 (2015) 1135–1142 .
[8] M. Miwa , M. Bansal , End-to-end relation extraction using lstms on sequences and tree structures, in: Proceedings of the 54rd Annual Meeting of the Association for Computational Linguistics, 2016 .
[9] C.N. dos Santos , B. Xiang , B. Zhou , Classifying relations by ranking with convolutional neural networks, in: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 1, 2015, pp. 626–634 .
[10] Y. Xu , L. Mou , G. Li , Y. Chen , H. Peng , Z. Jin , Classifying relations via long short term memory networks along shortest dependency paths, in: Proceedings of Conference on Empirical Methods in Natural Language Processing, 2015 .
[11] L. Zou , R. Huang , H. Wang , J.X. Yu , W. He , D. Zhao , Natural language question answering over RDF: a graph data driven approach, in: Proceedings of the 2014 ACM SIGMOD international conference on Management of data, ACM, 2014, pp. 313–324 .
[12] J. Sang , C. Xu , J. Liu , User-aware image tag refinement via ternary semantic analysis, IEEE Trans. Multimed. 14 (3) (2012) 883–895 .
[13] J. Sang , C. Xu , Right buddy makes the difference: An early exploration of social relation analysis in multimedia applications, in: Proceedings of the 20th ACM International Conference on Multimedia, ACM, 2012, pp. 19–28 .
[14] G. Luo , X. Huang , C.-Y. Lin , Z. Nie , Joint entity recognition and disambiguation, in: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2015, pp. 879–888 .
[m5G; March 8, 2017;1:24 ] 7
[15] J.P. Chiu, E. Nichols, Named entity recognition with bidirectional lstm-cnns, arXiv: 1511.08308 (2015).
[16] Z. Huang, W. Xu, K. Yu, Bidirectional lstm-crf models for sequence tagging, arXiv: 1508.01991 (2015).
[17] G. Lample , M. Ballesteros , S. Subramanian , K. Kawakami , C. Dyer , Neural architectures for named entity recognition, in: Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2016 .
[18] K. Xu Y. Feng, S. Huang, D. Zhao, Semantic relation classification via convolutional neural networks with simple negative sampling, arXiv: 1506.07650 (2015).
[19] D. Zeng , K. Liu , G. Zhou , J. Zhao , Relation classification via convolutional deep neural network, in: Proceedings of the 25th COLING International Conference, 2014, pp. 2335–2344 .
[20] A. Passos , V. Kumar , A. McCallum , Lexicon infused phrase embeddings for named entity resolution, in: Proceedings of the International Conference on Computational Linguistics, 2014, pp. 78–86 .
[21] R. Collobert , J. Weston , L. Bottou , M. Karlen , K. Kavukcuoglu , P. Kuksa , Natural language processing (almost) from scratch, J. Mach. Learn. Res. 12 (2011) 2493–2537 .
[22] X. Ma, E. Hovy, End-to-end sequence labeling via bi-directional lstm-cnns-crf, arXiv: 1603.01354 (2016).
[23] N. Kambhatla , Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations, in: Proceedings of the 43th ACL International Conference, 2004, p. 22 .
[24] R. Socher , B. Huval , C.D. Manning , A.Y. Nq , Semantic compositionality through recursive matrix-vector spaces, in: Proceedings of the EMNLP International Conference, 2012, pp. 1201–1211 .
[25] M. Yu , M. Gormleyl , M. Dredze , Factor-based compositional embedding models, in: Proceedings of the NIPS Workshop on Learning Semantics, 2014 .
[26] X. Yan , L. Moul , G. Li , Y. Chen , H. Peng , Z. Jin , Classifying relations via long short term memory networks along shortest dependency paths, in: Proceedings of EMNLP International Conference, 2015 .
[27] C.N. dos Santos , B. Xiangl , B. Zhou , Classifying relations by ranking with convolutional neural networks, in: Proceedings of the 53th ACL International Conference, vol. 1, 2015, pp. 626–634 .
[28] T.-V.T. Nguyen , A. Moschittil , G. Riccardi , Convolution kernels on constituent, dependency and sequential structures for relation extraction, in: Proceedings of the EMNLP International Conference, 2009, pp. 1378–1387 .
[29] P. Qin , W. Xu , J. Guo , An empirical convolutional neural network approach for semantic relation classification, Neurocomputing 190 (2016) 1–9 .
[30] S. Zheng , J. Xu , P. Zhou , H. Bao , Z. Qi , B. Xu , A neural network framework for relation extraction: Learning entity semantic and relation pattern, Knowl. Based Syst. 114 (2016) 12–23 .
[31] D. Zhang D. Wang, Relation classification via recurrent neural network, arXiv: 1508.01006 (2015).
[32] J. Ebrahimi , D. Dou ,Chain based RNN for relation classification, in: Proceedings of the NAACL International Conference, 2015, pp. 1244–1249 .
[33] S. Zhang , D. Zheng , X. Hu , M. Yang , Bidirectional long short-term memory networks for relation classification, in: Proceedings of the Pacific Asia Conference on Language, Information and Computation, 2015, pp. 73–78 .
[34] L. Sun , X. Han , A feature-enriched tree kernel for relation extraction, in: Proceedings of the 52th ACL International Conference, 2014, pp. pages 61– 67 .
[35] D. Roth , W.-t. Yih , Global inference for entity and relation identification via a linear programming formulation, in: Introduction to Statistical Relational Learning, 2007, pp. 553–580 .
[36] B. Yang , C. Cardie , Joint inference for fine-grained opinion extraction., in: Proceedings of the 51rd Annual Meeting of the Association for Computational Linguistics, 2013, pp. 1640–1649 .
[37] S. Singh , S. Riedel , B. Martin , J. Zheng , A. McCallum ,Joint inference of entities, relations, and coreference, in: Proceedings of the 2013 Workshop on Automated Knowledge Base Construction, ACM, 2013, pp. 1–6 .
[38] Y. LeCun , L. Bottou , Y. Bengio , P. Haffner , Gradient-based learning applied to document recognition, Proc. IEEE 86 (11) (1998) 2278–2324 .
[39] J. Yu, X. Yang, F. Gao, D. Tao, Deep multimodal distance metric learning using click constraints for image ranking, IEEE Trans. Cybern. (2016), doi: 10.1109/ TCYB.2016.2591583 .
[40] J. Yu , B. Zhang , Z. Kuang , D. Lin , J. Fan , Image privacy protection by identifying sensitive objects via deep multi-task learning, in: Proceedings of the IEEE Transactions on Information Forensics and Security, 2016 .
[41] Y. Kim , Convolutional neural networks for sentence classification, in: Proceedings of the EMNLP International Conference, 2014 .
[42] N. Kalchbrenner , E. Grefenstette , P. Blunsom ,A convolutional neural network for modelling sentences, in: Proceedings of the 52th ACL International Conference, 2014 .
[43] P. Wang , B. Xu , J. Xu , G. Tian , C.-L. Liu , H. Hao , Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification, Neurocomputing 174 (2016) 806–814 .
[44] X. Zhu , P. Sobihani , H. Guo , Long short-term memory over recursive structures, in: Proceedings of the 32nd International Conference on Machine Learning (ICML-15), 2015, pp. 1604–1612 .
[45] A. Graves , Supervised Sequence Labelling, Springer, 2012 . [46] M.-T. Luong , I. Sutskever , Q.V. Le , O. Vinyals , W. Zaremba , Addressing the rare word problem in neural machine translation, in: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, 2015, pp. 11–19 .
[47] R. Kiros , Y. Zhu , R.R. Salakhutdinov , R. Zemel , R. Urtasun , A. Torralba , S. Fidler , Skip-thought vectors, in: Proceedings of the Advances in Neural Information Processing Systems, 2015, pp. 3276–3284 .
[48] L. Ratinov , D. Roth , Design challenges and misconceptions in named entity recognition, in: Proceedings of the Thirteenth Conference on Computational Natural Language Learning, Association for Computational Linguistics, 2009, pp. 147–155 .
[49] N. Kalchbrenner , E. Grefenstette , P. Blunsom ,A convolutional neural network for modelling sentences, in: Proceedings of Conference on Empirical Methods in Natural Language Processing, 2014 .
[50] K. Duan , S.S. Keerthi , W. Chu , S.K. Shevade , A.N. Poo , Multi-category classification by soft-max combination of binary classifiers, in: Multiple Classifier Systems, Springer, 2003, pp. 125–134 .
[51] G.E. Dahl , T.N. Sainath , G.E. Hinton , Improving deep neural networks for LVCSR using rectified linear units and dropout, in: Proceedings of the ICASSP, 2013, pp. 8609–8613 .
[52] T. Tieleman , G. Hinton , Lecture 6.5-rmsprop, COURSERA: Neural networks for machine learning (2012) .
[53] J. Lafferty , A. McCallum , F. Pereira , Conditional random fields: Probabilistic models for segmenting and labeling sequence data, in: Proceedings of the Eighteenth International Conference on Machine Learning, ICML, vol. 1, 2001, pp. 282–289 .
[54] S.J. Phillips , R.P. Anderson , R.E. Schapire , Maximum entropy modeling of species geographic distributions, Ecol. Modell. 190 (3) (2006) 231–259 .