Alexander Schutz等人认为关系抽取是自动识别由一对概念和联系这对概念的关系构成的相关三元组。
Example1: 比尔盖茨是微软的CEO
CEO(比尔盖茨, 微软)
Example2: CMU坐落于匹兹堡
Located-in(CMU, 匹兹堡)
Example3:Michael Jordan获得1997/98赛季的MVP
Award(Michael Jordan, 1997/98赛季, MVP)
结构化与半结构化的关系抽取方法一般利用其特定的网页结构。而非结构化的文本关系抽取一般可以分为三类:传统关系抽取、开放域关系抽取、关系发现。
主要任务为给定实体关系类别,给定语料,抽取目标关系对。在该类任务中,有专家标注的语料,语料质量高,而且有公认的评价方式。常用的评测集有MUC、ACE、KBP、SemEval。
在抽取方法上,目前主要采用统计机器学习的方法,将关系实例转换成高维空间中的特征向量或直接用离散结构来表示,在标注语料库上训练生成分类模型,然后再识别实体间关系。主要包括三类方法。
基于特征向量方法[24-26]
主要问题:如何获取各种有效的词法、句法、语义等特征,并把它们有效地集成起来,从而产生描述实体语义关系的各种局部特征和简单的全局特征。
特征选取:从自由文本及其句法结构中抽取出各种表面特征以及结构化特征。
分类器:最大墒模型[24]和SVM[25,26].
基于核函数方法[27-31]
主要问题:如何有效挖掘反映语义关系的结构化信息及如何有效计算结构化信息之间的相似度
卷积树核:用两个句法树之间的公共子树的数目来衡量它们之间的相似度
标准的卷积树核(CTK)
在计算两棵子树的相似度时,只考虑子树本身,不考虑子树的上下文信息
上下文相关卷积树核函数(CS-CTK)
在计算子树相似度量,同时考虑子树的祖先信息,如子树根结点的父结点、祖父结点信息,并对不同祖先的子树相似度加权平均
论文方法:浅层树核[27]、依存树核[28]、最短依存树核[29]、卷积树核[30, 31]
基于神经网络的方法[32,33]
主要问题:如何设计合理的网络结构,从而捕捉更多的信息,进而更准确的完成关系的抽取
网络结构:不同的网络结构捕捉文本中不同的信息
递归神经网络(RNN)[32]
网络的构建过程更多的考虑到句子的句法结构,但是需要依赖复杂的句法分析工具
卷积神经网络(CNN)[33]
通过卷积操作完成句子级信息的捕获,不需要复杂的NLP工具
总体来说,传统的人工标注语料+机器学习算法模式无法满足开放域开放语料下的信息抽取,构建语料的成本过高,因此出现了开放域关系抽取。
开放域抽取的特点是不限定关系类别、不限定目标文本。难点在于如何获取训练语料、如何获取实体关系类别、如何针对不同类型目标文本抽取关系。
Bootstrapping:模板生成->实例抽取->迭代直至收敛,但是存在语义漂移问题即迭代会引入噪音实例和噪音模板,例如
首都:Rome城市模板“* is a city of ”
Paris is a city of France.
Paris is a city of Romance.
为了解决这个问题,文献[34]同时扩展多个互斥类别,同时扩展人物、地点、机构,一个实体只能属于一个类别。文献[35]引入负实例来限制语义漂移。文献[36]构建一个NELL系统,利用一个初始的本体和大量Web网页抽取模版。
通过识别表达语义关系的短语来抽取实体之间的关系,同时使用句法和统计数据来过滤抽取出来的三元组。关系短语应当是一个以动词为核心的短语且应当匹配多个不同实体对。开放抽取的优点在于无需预先定义关系类别,但是也存在语义没有归一化,同一关系有不同表示等问题。
文献[37]构建TextRunner系统,从Web网页中抽取包含用户输入的特定谓词和论元的三元组。文献[38]利用维基百科的数据,从Infobox抽取关系信息并在维基百科条目中进行回标。
开放域信息抽取的一个主要问题是缺乏标注语料 Distant Supervision:使用知识库中的关系启发式的标注训练语料.
文献[39]提出了简单远距离监督方法,他建立DS假设,每一个同时包含两个实体的句子都会表述这两个实体在知识库中的对应关系,利用上述假设标注所有句子作为训练语料,使用最大熵分类器来构建IE系统,但主要问题是噪音训练实例。对于噪声问题,解决方法均假设一个正确的训练实例会位于语义一致的区域,也就是其周边的实例应当都有相同一致的Label,按照这一假设,文献[40]提出了基于生成式模型的方法,文献[41]提出了基于稀疏表示的方法。
文献[42]从纽约时报中抽取Freebase的关系类别,将弱监督关系抽取看作是多示例问题,利用分段卷积网络自动学习特征。文献[43]利用基于细粒度实体类型特征发现的弱监督关系抽取,考虑细粒度实体类型并利用三种融合方法:替换方法、扩展方法和选择方法。
开放域关系发现主要是在现有的知识图谱的基础上,对于未连接的实体之间的关系进行合理的推理和补全。根据知识表示的不同可以分为两大类方法,分别为逻辑符号和分布式表示。
逻辑表示一般使用归纳逻辑编程 (Inductive Logic Programming)和概率图模型(Probabilistic Graph Model)。优点在于表达能力强,人类可理解以及可提供精确的结果。
但是知识库的规模越来越大,逻辑表示很难高效的扩展到大规模知识库上(例如Freebase);逻辑规则通常使用霍恩子句表达,其在推理时只能考虑与对象有紧密联系的少数概念和关系,不能进行全局推理,否则是NP难问题。因为容纳的影响因素较少,推断不准确;推理建立在明确的符号基础上, 很难学习隐藏的推理规则。
基于符号逻辑的关系发现的主要任务为:
事实推理:推理出隐含的事实,如(西兰花可以防止骨质疏松)
推理规则学习:学习查询之间的逻辑推理关系,如(含有能预防骨质疏松元素的蔬菜能可以防止骨质疏松)
文献[44]通过统计关系路径的共现情况学习霍恩子句表示的推理规则,文献[45]通过实体间在图中的链接特征学习关系分类器,得到路径与关系的推理规则。
在数值空间中编码实体和关系,头部实体和尾部实体都以向量表示,关系表示为向量之间的操作,学习依赖三元组的相似性。主要方法有三种。
- 张量分解法[46]
主要思想:张量分解方法将知识图谱编码为一个三维邻接张量,然后将其分解为一个核心张量和因子矩阵的乘积。分解后的结果作为对应三元组存在与否的概率。
- 基于翻译的模型[47-63]
主要思想:将关系看做头部实体到尾部实体的翻译
相关文献:文献[47]最先提出这一思想,文献[48]将原模型结合非结构化文本进行优化;文献[49-42]在原模型的基础上考虑一对多,多对一,多对多等多种关系;文献[53-55]考虑不同关系和实体类型的分布不均匀的性质;文献[56, 57]将实体的描述结合到原模型中;文献[58, 59]结合推断规则增强推理效果;文献[60, 61]结合关系路径进行考虑;[62, 63]则是其它类型到扩展。
- 基于神经网络的能量函数模型[64, 65]
主要思想:使用神经网络定义三元组的能量函数,为三元组打分,通过惩罚错误的三元组完成学习过程。
文献[66]是将两种方法相结合,从而达到更加好的效果
[24] Kambhatla N. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations[C]//Proceedings of the ACL 2004 on Interactive poster and demonstration sessions. Association for Computational Linguistics, 2004: 22.
[25] Zhao S, Grishman R. Extracting relations with integrated information using kernel methods[C]//Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2005: 419-426.
[26] Jiang J, Zhai C X. A Systematic Exploration of the Feature Space for Relation Extraction[C]//HLT-NAACL. 2007: 113-120.
[27] Zelenko D, Aone C, Richardella A. Kernel methods for relation extraction[J]. Journal of machine learning research, 2003, 3(Feb): 1083-1106.
[28] Culotta A, Sorensen J. Dependency tree kernels for relation extraction[C]//Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2004: 423.
[29] Bunescu R C, Mooney R J. A shortest path dependency kernel for relation extraction[C]//Proceedings of the conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, 2005: 724-731.
[30] Zhang M, Zhang J, Su J. Exploring syntactic features for relation extraction using a convolution tree kernel[C]//Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. Association for Computational Linguistics, 2006: 288-295.
[31] Zhou G, Zhang M. Extracting relation information from text documents by exploring various types of knowledge[J]. Information Processing & Management, 2007, 43(4): 969-982.
[32] Socher R, Huval B, Manning C D, et al. Semantic compositionality through recursive matrix-vector spaces[C]//Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 2012: 1201-1211.
[33] Zeng D, Liu K, Lai S, et al. Relation Classification via Convolutional Deep Neural Network[C]//COLING. 2014: 2335-2344.
[34] McIntosh T, Curran J R. Reducing semantic drift with bagging and distributional similarity[C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1. Association for Computational Linguistics, 2009: 396-404.
[35] Shi B, Zhang Z, Sun L, et al. A Probabilistic Co-Bootstrapping Method for Entity Set Expansion[C]//COLING. 2014: 2280-2290.
[36] Betteridge J, Carlson A, Hong S A, et al. Toward Never Ending Language Learning[C]//AAAI Spring Symposium: Learning by Reading and Learning to Read. 2009: 1-2.
[37] Banko M, Cafarella M J, Soderland S, et al. Open Information Extraction from the Web[C]//IJCAI. 2007, 7: 2670-2676.
[38] Wu F, Weld D S. Autonomously semantifying wikipedia[C]//Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. ACM, 2007: 41-50.
[39] Mintz M, Bills S, Snow R, et al. Distant supervision for relation extraction without labeled data[C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2. Association for Computational Linguistics, 2009: 1003-1011.
[40] Takamatsu S, Sato I, Nakagawa H. Reducing wrong labels in distant supervision for relation extraction[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1. Association for Computational Linguistics, 2012: 721-729.
[41] Han X, Sun L. Semantic Consistency: A Local Subspace Based Method for Distant Supervised Relation Extraction[C]//ACL (2). 2014: 718-724.
[42] Zeng D, Liu K, Chen Y, et al. Distant supervision for relation extraction via piecewise convolutional neural networks[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), Lisbon, Portugal. 2015: 17-21.
[43] Yang Liu, Kang Liu, Liheng Xu, Jun Zhao. Exploring Fine-grained Entity Type Constraints for Distantly Supervised Relation Extraction[C]//Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers. 2014: 2107–2116.
[44] Schoenmackers S, Etzioni O, Weld D S, et al. Learning first-order horn clauses from web text[C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2010: 1088-1098.
[45] Lao N, Mitchell T, Cohen W W. Random walk inference and learning in a large scale knowledge base[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2011: 529-539.
[46] Nickel M, Tresp V, Kriegel H P. A three-way model for collective learning on multi-relational data[C]//Proceedings of the 28th international conference on machine learning (ICML-11). 2011: 809-816.
[47] Bordes A, Usunier N, Garcia-Duran A, et al. Translating embeddings for modeling multi-relational data[C]//Advances in Neural Information Processing Systems. 2013: 2787-2795.
[48] Wang Z, Zhang J, Feng J, et al. Knowledge Graph and Text Jointly Embedding[C]//EMNLP. 2014: 1591-1601.
[49] Wang Z, Zhang J, Feng J, et al. Knowledge Graph Embedding by Translating on Hyperplanes[C]//AAAI. 2014: 1112-1119.
[50] Lin Y, Liu Z, Sun M, et al. Learning Entity and Relation Embeddings for Knowledge Graph Completion[C]//AAAI. 2015: 2181-2187.
[51] Ji G, He S, Xu L, et al. Knowledge graph embedding via dynamic mapping matrix[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 2015: 687-696.
[52] Xiao H, Huang M, Hao Y, et al. TransG: A Generative Mixture Model for Knowledge Graph Embedding[J]. arXiv preprint arXiv:1509.05488, 2015.
[53] He S, Liu K, Ji G, et al. Learning to represent knowledge graphs with gaussian embedding[C]//Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 2015: 623-632.
[54] Ji G, Liu K, He S, et al. Knowledge graph completion with adaptive sparse transfer matrix[C]//Proceedings of AAAI. 2016.
[55] Xiao H, Huang M, Hao Y, et al. TransA: An Adaptive Approach for Knowledge Graph Embedding[J]. arXiv preprint arXiv:1509.05490, 2015.
[56] Zhong H, Zhang J, Wang Z, et al. Aligning knowledge and text embeddings by entity descriptions[C]//Proceedings of EMNLP. 2015: 267-272.
[57] Zhang D, Yuan B, Wang D, et al. Joint semantic relevance learning with text data and graph knowledge[J]. ACL-IJCNLP 2015, 2015: 32.
[58] Wang Q, Wang B, Guo L. Knowledge base completion using embeddings and rules[C]//Proceedings of the 24th International Joint Conference on Artificial Intelligence. 2015: 1859-1865.
[59] Wei Z, Zhao J, Liu K, et al. Large-scale knowledge base completion: Inferring via grounding network sampling over selected instances[C]//Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 2015: 1331-1340.
[60] Lin Y, Liu Z, Luan H, et al. Modeling relation paths for representation learning of knowledge bases[J]. arXiv preprint arXiv:1506.00379, 2015.
[61] Guu K, Miller J, Liang P. Traversing knowledge graphs in vector space[J]. arXiv preprint arXiv:1506.01094, 2015.
[62] Luo Y, Wang Q, Wang B, et al. Context-Dependent Knowledge Graph Embedding[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015: 1656-1661.
[63] Guo S, Wang Q, Wang B, et al. Semantically smooth knowledge graph embedding[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 2015: 84-94.
[64] Socher R, Chen D, Manning C D, et al. Reasoning with neural tensor networks for knowledge base completion[C]//Advances in Neural Information Processing Systems. 2013: 926-934.
[65] Bordes A, Glorot X, Weston J, et al. A semantic matching energy function for learning with multi-relational data[J]. Machine Learning, 2014, 94(2): 233-259.