《Relational inductive biases, deep learning, and graph networks》论文解读(转载)

mark一下,感谢作者分享!





(后续还会补充)

研究背景

机器学习界有三个主要学派,符号主义(Symbolicism)、连接主义(Connectionism)、行为主义(Actionism)。

符号主义的起源,注重研究知识表达和逻辑推理。经过几十年的研究,目前这一学派的主要成果,一个是贝叶斯因果网络,另一个是知识图谱。

贝叶斯因果网络的旗手是 Judea Pearl 教授,2011年的图灵奖获得者。但是据说 2017年 NIPS 学术会议上,老爷子演讲时,听众寥寥。2018年,老爷子出版了一本新书,“The Book of Why”,为因果网络辩护,同时批判深度学习缺乏严谨的逻辑推理过程。而知识图谱主要由搜索引擎公司,包括谷歌、微软、百度推动,目标是把搜索引擎,由关键词匹配,推进到语义匹配。

连接主义的起源是仿生学,用数学模型来模仿神经元。Marvin Minsky 教授因为对神经元研究的推动,获得了1969年图灵奖。把大量神经元拼装在一起,就形成了深度学习模型,深度学习的旗手是 Geoffrey Hinton 教授。深度学习模型最遭人诟病的缺陷,是不可解释。

行为主义把控制论引入机器学习,最著名的成果是强化学习。强化学习的旗手是 Richard Sutton 教授。近年来Google DeepMind 研究员,把传统强化学习,与深度学习融合,实现了 AlphaGo,战胜当今世界所有人类围棋高手。

DeepMind 之前发表的这篇论文,提议把传统的贝叶斯因果网络和知识图谱,与深度强化学习融合,并梳理了与这个主题相关的研究进展。

正文

《Relational inductive biases, deep learning, and graph networks》 《关系归纳偏置、深度学习和图网络》
DeepMind联合谷歌大脑、MIT等机构27位作者发表重磅论文,提出“图网络”(Graph network),将端到端学习与归纳推理相结合,有望解决深度学习无法进行关系推理的问题。

在论文里,作者探讨了如何在深度学习结构(比如全连接层、卷积层和递归层)中,使用关系归纳偏置(relational inductive biases),促进对实体、对关系,以及对组成它们的规则进行学习。
他们提出了一个新的AI模块——图网络(graph network),是对以前各种对图进行操作的神经网络方法的推广和扩展。图网络具有强大的关系归纳偏置,为操纵结构化知识和生成结构化行为提供了一个直接的界面。
作者还讨论了图网络如何支持关系推理和组合泛化,为更复杂、可解释和灵活的推理模式打下基础。

Abstract

 摘要

Artificial intelligence (AI) has undergone a renaissance recently, making major progress in key domains such as vision, language, control, and decision-making. This has been due, in part, to cheap data and cheap compute resources, which have t the natural strengths of deep learning. However, many defining characteristics of human intelligence, which developed under much different pressures, remain out of reach for current approaches. In particular, generalizing beyond one’s experiences – a hallmark of human intelligence from infancy – remains a formidable challenge for modern AI.

 

人工智能最近经历了一场复兴,在视觉、语言、控制和决策等关键领域取得了重大进展。取得这些进展的部分原因是由于廉价的数据和计算资源,它们符合深度学习的天然优势。然而,在不同压力下发展起来的人类智力,其许多决定性特征对于目前的方法而言仍是触不可及的。特别是,超越经验的泛化能力——人类智力从幼年开始发展的标志——仍然是现代人工智能面临的巨大挑战。

The following is part position paper, part review, and part unification. We argue that combinatorial generalization must be a top priority for AI to achieve human-like abilities, and that structured representations and computations are key to realizing this objective. Just as biology uses nature and nurture cooperatively, we reject the false choice between “hand-engineering” and “end-to-end” learning, and instead advocate for an approach which benefits from their complementary strengths. We explore how using relational inductive biases within deep learning architectures can facilitate learning about entities, relations, and rules for composing them. We present a new building block for the AI toolkit with a strong relational inductive bias–the graph network–which generalizes and extends various approaches for neural networks that operate on graphs, and provides a straightforward interface for manipulating structured knowledge and producing structured behaviors. We discuss how graph networks can support relational reasoning and combinatorial generalization, laying the foundation for more sophisticated, interpretable, and flexible patterns of reasoning.

本论文包含部分新研究、部分回顾和部分统一结论。我们认为组合泛化是人工智能实现与人类相似能力的首要任务,而结构化表示和计算是实现这一目标的关键。正如生物学把自然与人工培育相结合,我们摒弃「手动设计特征」与「端到端」学习二选一的错误选择,而是倡导一种利用它们互补优势的方法。我们探索在深度学习架构中使用关系归纳偏置如何有助于学习实体、关系以及构成它们的规则。我们为具有强烈关系归纳偏置的 AI 工具包提出了一个新构造块——图网络(Graph Network),它泛化并扩展了各种对图进行操作的神经网络方法,并为操作结构化知识和产生结构化行为提供了直接的界面。我们讨论图网络如何支持关系推理和组合泛化,为更复杂的、可解释的和灵活的推理模式奠定基础。

介绍

A key signature of human intelligence is the ability to make “infinite use of finite means” (Humboldt, 1836; Chomsky, 1965), in which a small set of elements (such as words) can be productively composed in limitless ways (such as into new sentences). This reflects the principle of combinatorial generalization, that is, constructing new inferences, predictions, and behaviors from known building blocks. Here we explore how to improve modern AI’s capacity for combinatorial generalization by biasing learning towards structured representations and computations, and in particular, systems that operate on graphs.

人类智能的一个关键特征是“无限使用有限方法”的能力(Humboldt,1836; Chomsky,1965),其中一小部分元素(如单词)可以以无限的方式(如新句子)有效地组合在一起。这反映了组合归纳的原则,即从已知的构建块构造新的推论、预测和行为。在这里,我们探讨如何通过将学习向结构化的表示和计算,特别是在图上计算的模式,来提高现代人工智能的组合推广能力。

Humans’ capacity for combinatorial generalization depends critically on our cognitive mechanisms for representing structure and reasoning about relations. We represent complex systems as compositions of entities and their interactions1 (Navon, 1977; McClelland and Rumelhart, 1981; Plaut et al., 1996; Marcus, 2001; Goodwin and Johnson-Laird, 2005; Kemp and Tenenbaum, 2008), such as judging whether a haphazard stack of objects is stable (Battaglia et al., 2013). We use hierarchies to abstract away from fine-grained differences, and capture more general commonalities between representations and behaviors (Botvinick, 2008; Tenenbaum et al., 2011), such as parts of an object, objects in a scene, neighborhoods in a town, and towns in a country. We solve novel problems by composing familiar skills and routines (Anderson, 1982), for example traveling to a new location by composing familiar procedures and objectives, such as  “travel by airplane”,  “to San Diego”,  “eat at”, and  “an Indian restaurant”. We draw analogies by aligning the relational structure between two domains and drawing inferences about one based on corresponding knowledge about the other (Gentner and Markman, 1997; Hummel and Holyoak, 2003).

人类的组合概括能力主要取决于我们表达结构和推理关系的认知机制。我们将复杂系统表示为实体及其相互作用的组合(Navon, 1977; McClelland 和Rumelhart,1981;Plaut 等人 .,1996;Marcus,2001; Goodwin 和Johnson-Laird,2005;Kemp and Tenenbaum, 2008),比如判断一个不规则的物体堆栈是否稳定(Battaglia 等人., 2013)。我们使用层次结构来抽象细粒度差异,并捕获表示和行为之间更一般的共性(Botvinick, 2008;Tenenbaum 等人., 2011),比如一个物体的一部分,一个场景中的物体,一个城镇的社区,一个国家的城镇。我们通过编写熟悉的技能和惯例来解决新奇的问题(Anderson, 1982),例如通过编写熟悉的程序和目标去一个新的地方旅行,如“乘飞机旅行”、“到圣地亚哥旅行”、“在那里吃饭”和“一家印度餐馆”。我们通过将两个域之间的关系结构对齐,并基于对另一个域的相应知识对其中一个域进行推断来进行类比(Gentner和Markman, 1997; Hummel和Holyoak,2003)。

Kenneth Craik’s  “The Nature of Explanation” (1943), connects the compositional structure of the world to how our internal mental models are organized:

…[a human mental model] has a similar relation-structure to that of the process it imitates. By `relation-structure’ I do not mean some obscure non-physical entity which attends the model, but the fact that it is a working physical model which works in the same way as the process it parallels… physical reality is built up, apparently, from a few fundamental types of units whose properties determine many of the properties of the most complicated phenomena, and this seems to afford a sufficient explanation of the emergence of analogies between mechanisms and similarities of relation-structure among these combinations without the necessity of any theory of objective universals. (Craik, 1943, page 51-55)

Kenneth Craik的《自然的本质》(The Nature of Explanation)(1943)将世界的成分结构与我们的内在心理模型的组织方式联系在一起:

…[人类心理模型]与它所模仿的过程有相似的关系结构。我所说的“关系结构”,并不是指某个模糊的非物理实体会出现在模型中,而是说它是一个可以工作的物理模型,其工作方式与它所对应的过程是一样的……建立物理现实,显然,从几个基本类型的单位属性确定的许多属性最复杂的现象,这似乎承受足够的解释机制和相似性之间的类比的出现relation-structure在这些组合没有任何客观的共性理论的必要性。(1943年Craik 51-55页)

That is, the world is compositional, or at least, we understand it in compositional terms. When learning, we either fit new knowledge into our existing structured representations, or adjust the structure itself to better accommodate (and make use of) the new and the old (Tenenbaum et al., 2006; Griffiths et al., 2010; Ullman et al., 2017).

也就是说,世界是有成分的,或者至少,我们是用成分的术语来理解的。在学习时,我们要么将新知识融入现有的结构化表示,要么调整结构本身以更好地适应(并利用)新旧(Tenenbaum 等人., 2006; Griffiths 等人 .,2010;Ullman 等人 .,2017)。

The question of how to build artificial systems which exhibit combinatorial generalization has been at the heart of AI since its origins, and was central to many structured approaches, including logic, grammars, classic planning, graphical models, causal reasoning, Bayesian nonparametrics, and probabilistic programming (Chomsky, 1957; Nilsson and Fikes, 1970; Pearl, 1986, 2009; Russell and Norvig, 2009; Hjort et al., 2010; Goodman et al., 2012; Ghahramani, 2015). Entire sub-fields have focused on explicit entity- and relation-centric learning, such as relational reinforcement learning (Dzeroski et al., 2001) and statistical relational learning (Getoor and Taskar, 2007). A key reason why structured approaches were so vital to machine learning in previous eras was, in part, because data and computing resources were expensive, and the improved sample complexity afforded by structured approaches’ strong inductive biases was very valuable.

如何构建表现出组合归纳的人工系统的问题自人工智能诞生以来就一直是人工智能的核心,也是许多结构化方法的核心,包括逻辑、语法、经典规划、图形模型、因果推理、贝叶斯非参数化和概率规划(Chomsky,1957); Nilsson和Fikes,1970; Pearl、1986、2009; Russell和Norvig,2009;Hjort 等人 .,2010; Goodman 等人 .,2012;Ghahramani,2015)。整个子领域集中于显式实体和关系中心学习,例如关系增强学习(Dzeroski等,2001)和统计关系学习(Getoor和Taskar, 2007)。结构化方法在以前的时代对机器学习如此重要的一个关键原因,部分是因为数据和计算资源是昂贵的,而结构化方法的强归纳偏差带来的改进的样本复杂性是非常有价值的。

In contrast with past approaches in AI, modern deep learning methods (LeCun et al., 2015; Schmidhuber, 2015; Goodfellow et al., 2016) often follow an “end-to-end” design philosophy which emphasizes minimal a priori representational and computational assumptions, and seeks to avoid explicit structure and “hand-engineering”. This emphasis has fit well with–and has perhaps been affirmed by–the current abundance of cheap data and cheap computing resources, which make trading off sample efficiency for more flexible learning a rational choice. The remarkable and rapid advances across many challenging domains, from image classification (Krizhevsky et al., 2012; Szegedy et al., 2017), to natural language processing (Sutskever et al., 2014; Bahdanau et al., 2015), to game play (Mnih et al., 2015; Silver et al., 2016; Moravcik et al., 2017), are a testament to this minimalist principle. A prominent example is from language translation, where sequence-to-sequence approaches (Sutskever et al., 2014; Bahdanau et al., 2015) have proven very effective without using explicit parse trees or complex relationships between linguistic entities.

与以往的人工智能方法相比,现代深度学习方法(LeCun et al., 2015;Schmidhuber,2015;Goodfellow等人,2016)经常遵循“端到端的”设计理念,强调最小的先验表征和计算假设,并试图避免显式结构和“手工工程”。这种强调与目前大量的廉价数据和廉价的计算资源非常契合——或许已经得到肯定——这些资源使得用样本效率换取更灵活的学习成为一种理性的选择。从图像分类到许多具有挑战性的领域的显著和快速进展(Krizhevsky 等人., 2012;Szegedy 等人., 2017),关于自然语言处理 (Sutskever 等人., 2014;Bahdanau 等人., 2015),到玩游戏(Mnih 等人., 2015; Silver 等人 .,2016;Moravcik 等人., 2017),是这一极简主义原则的证明。一个突出的例子来自于语言翻译,其中序列到序列的方法(Sutskever 等人., 2014;Bahdanau等人,2015)在没有使用显式的解析树或语言实体之间的复杂关系的情况下,已经证明是非常有效的。

Despite deep learning’s successes, however, important critiques (Marcus, 2001; Shalev-Shwartz et al., 2017; Lake et al., 2017; Lake and Baroni, 2018; Marcus, 2018a,b; Pearl, 2018; Yuille and Liu, 2018) have highlighted key challenges it faces in complex language and scene understanding, reasoning about structured data, transferring learning beyond the training conditions, and learning from small amounts of experience. These challenges demand combinatorial generalization, and so it is perhaps not surprising that an approach which eschews compositionality and explicit structure struggles to meet them.

尽管深度学习取得了成功,但也有重要的批评(Marcus, 2001;Shalev-Shwartz 等人 .,2017;Lake 等人 .,2017;Lake 和Baroni,2018;Marcus,2018 a,b;Pearl, 2018;Yuille和Liu, 2018)强调了它在复杂的语言和场景理解、对结构化数据进行推理、在训练条件之外转移学习以及从少量经验中学习等方面所面临的关键挑战。这些挑战需要组合归纳,因此,一种避免复合性和显式结构的方法很难满足它们,这或许并不令人惊讶。

When deep learning’s connectionist (Rumelhart et al., 1987) forebears were faced with analogous critiques from structured, symbolic positions (Fodor and Pylyshyn, 1988; Pinker and Prince, 1988), there was a constructive effort (Bobrow and Hinton, 1990; Marcus, 2001) to address the challenges directly and carefully. A variety of innovative sub-symbolic approaches for representing and reasoning about structured objects were developed in domains such as analogy-making, linguistic analysis, symbol manipulation, and other forms of relational reasoning (Smolensky, 1990; Hinton, 1990; Pollack, 1990; Elman, 1991; Plate, 1995; Eliasmith, 2013), as well as more integrative theories for how the mind works (Marcus, 2001). Such work also helped cultivate more recent deep learning advances which use distributed, vector representations to capture rich semantic content in text (Mikolov et al., 2013; Pennington et al., 2014), graphs (Narayanan et al., 2016, 2017), algebraic and logical expressions (Allamanis et al., 2017; Evans et al., 2018), and programs (Devlin et al., 2017; Chen et al., 2018b).

当深度学习的联结主义者(Rumelhart等人., 1987),先辈们面临着结构性的、象征性的立场的类似批评(Fodor和Pylyshyn, 1988; Pinker和Prince,1988),有建设性的努力(Bobrow和Hinton, 1990; Marcus,2001)直接而仔细地应对挑战。在模拟制造、语言分析、符号操作和其他形式的关系推理等领域中,开发了各种创新的表示和推理结构化对象的子符号方法(Smolensky, 1990; Hinton,1990;Pollack,1990;Elman,1991; Plate,1995;Eliasmith, 2013),以及关于大脑如何工作的更综合的理论(Marcus, 2001)。这类工作也有助于培养更近期的深度学习进展,利用分布式的、向量表示来捕获文本中丰富的语义内容(Mikolov 等人., 2013; Pennington等,2014),图(Narayanan等,2016,2017),代数和逻辑表达式(Allamanis等,2017; Evans,2018年,以及项目(Devlin,2017年; Chen 等人 .,2018 b)。

We suggest that a key path forward for modern AI is to commit to combinatorial generalization as a top priority, and we advocate for integrative approaches to realize this goal. Just as biology does not choose between nature versus nurture–it uses nature and nurture jointly, to build wholes which are greater than the sums of their parts–we, too, reject the notion that structure and flexibility are somehow at odds or incompatible, and embrace both with the aim of reaping their complementary strengths. In the spirit of numerous recent examples of principled hybrids of structure-based methods and deep learning (e.g., Reed and De Freitas, 2016; Garnelo et al., 2016; Ritchie et al., 2016; Wu et al., 2017; Denil et al., 2017; Hudson and Manning, 2018), we see great promise in synthesizing new techniques by drawing on the full AI toolkit and marrying the best approaches from today with those which were essential during times when data and computation were at a premium.

我们认为,现代人工智能的关键路径是将组合归纳作为首要任务,并提倡采用综合方法来实现这一目标。就像生物学不在先天和后天之间做出选择——它使用共同先天与后天,构建整体大于部分的金额,我们也拒绝认为结构和灵活性是争执或不兼容,和拥抱都收割他们的互补优势的目的。基于结构基础方法和深度学习的有原则的混合例子的精神(如Reed和De Freitas, 2016);Garnelo 等人.,2016; Ritchie 等人.,2016; Wu 等人.,2017;Denil 等人.,2017; Hudson和Manning(2018)),我们看到了利用完整的人工智能工具包综合新技术的巨大前景,并将当今最好的方法与那些在数据和计算极为重要的时代必不可少的方法结合起来。

Recently, a class of models has arisen at the intersection of deep learning and structured approaches, which focuses on approaches for reasoning about explicitly structured data, in particular graphs (e.g. Scarselli et al., 2009b; Bronstein et al., 2017; Gilmer et al., 2017; Wang et al., 2018c; Li et al., 2018; Kipf et al., 2018; Gulcehre et al., 2018). What these approaches all have in common is a capacity for performing computation over discrete entities and the relations between them. What sets them apart from classical approaches is how the representations and structure of the entities and relations–and the corresponding computations–can be learned, relieving the burden of needing to specify them in advance. Crucially, these methods carry strong relational inductive biases, in the form of specific architectural assumptions, which guide these approaches towards learning about entities and relations (Mitchell, 1980), which we, joining many others (Spelke et al., 1992; Spelke and Kinzler, 2007; Marcus, 2001; Tenenbaum et al., 2011; Lake et al., 2017; Lake and Baroni, 2018; Marcus, 2018b), suggest are an essential ingredient for human-like intelligence.

最近,在深度学习和结构化方法的交集中出现了一类模型,这些模型关注于对显式结构化数据进行推理的方法,特别是图(如Scarselli等,2009b; Bronstein 等人 .,2017; Gilmer 等人 .,2017; Wang 等人 .,2018 c; Li 等人 .,2018;Kipf 等人 .,2018;Gulcehre 等人 .,2018)。这些方法的共同之处在于,它们都具有在离散实体上执行计算的能力,以及它们之间的关系。与经典方法不同的是,如何学习实体和关系的表示和结构——以及相应的计算——以减轻预先指定它们的负担。至关重要的是,这些方法带有强烈的关系归纳偏见,以特定的架构假设的形式,引导这些方法学习实体和关系(Mitchell, 1980),我们加入了许多其他方法(Spelke 等人., 1992; Spelke和Kinzler,2007; Marcus,2001; Tenenbaum 等人 .,2011; Lake 等人 .,2017; Lake和Baroni,2018;Marcus, 2018b)提出的建议是类人智能的重要组成部分。

In the remainder of the paper, we examine various deep learning methods through the lens of their relational inductive biases, showing that existing methods often carry relational assumptions which are not always explicit or immediately evident. We then present a general framework for entity- and relation-based reasoning–which we term graph networks–for unifying and extending existing methods which operate on graphs, and describe key design principles for building powerful architectures using graph networks as building blocks.

在本文的其余部分中,我们通过关系归纳偏见的视角来研究各种深度学习方法,表明现有的方法往往带有关系假设,这些假设并不总是显式的或立即可见的。然后,我们提出了一个基于实体和关系的推理的通用框架——我们称之为图网络——用于统一和扩展现有的对图进行操作的方法,并描述了使用图网络作为构建块构建强大架构的关键设计原则。

Box 1: Relational reasoning

框1:关系推理

We define structure as the product of composing a set of known building blocks. “Structured representations” capture this composition (i.e., the arrangement of the elements) and “structured computations” operate over the elements and their composition as a whole. Relational reasoning, then, involves manipulating structured representations of entities and relations, using rules for how they can be composed. We use these terms to capture notions from cognitive science, theoretical computer science, and AI, as follows:

我们将结构定义为组成一组已知构件的产物。“结构化表示”捕捉这个组成(即元素的排列)和“结构化计算”作为一个整体对元素及其组成进行操作。然后,关系推理涉及操纵实体和关系的结构化表示,并使用关于如何构成它们的规则。我们使用这些术语来捕捉认知科学,理论计算机科学和人工智能的概念,如下所示:

An entity is an element with attributes, such as a physical object with a size and mass.

A relation is a property between entities. Relations between two objects might include same size as, heavier than, and distance from. Relations can have attributes as well. The relation more than X times heavier than takes an attribute, X, which determines the relative weight threshold for the relation to be true vs. false. Relations can also be sensitive to the global context. For a stone and a feather, the relation falls with greater acceleration than depends on whether the context is in air vs. in a vacuum. Here we focus on pairwise relations between entities.

A rule is a function (like a non-binary logical predicate) that maps entities and relations to other entities and relations, such as a scale comparison like is entity X large? And is entity X heavier than entity Y?. Here we consider rules which take one or two arguments (unary and binary), and return a unary property value.

实体是具有属性的元素,例如具有大小和质量的物理对象。
关系是实体之间的属性。两个物体之间的关系可能包括相同的尺寸,重量和距离。关系也可以具有属性。超过X倍以上的关系取一个属性,X,它决定了关系的相对权重阈值是真还是假。关系也可能对全局环境敏感。对于一块石头和一根羽毛,这种关系的下降速度要大于根据上下文是在空气中还是在真空中。这里我们关注实体之间的配对关系。
规则是一种函数(比如非二进制逻辑谓词),它将实体和关系映射到其他实体和关系,比如像比较实体X那样的尺度比较?实体X比实体Y重吗?这里我们考虑采用一个或两个参数(一元和二元)的规则,并返回一元属性值。

As an illustrative example of relational reasoning in machine learning, graphical models (Pearl, 1988; Koller and Friedman, 2009) can represent complex joint distributions by making explicit random conditional independences among random variables. Such models have been very successful because they capture the sparse structure which underlies many real-world generative processes and because they support e_cient algorithms for learning and reasoning. For example, hidden Markov models constrain latent states to be conditionally independent of others given the state at the previous time step, and observations to be conditionally independent given the latent state at the current time step, which are well-matched to the relational structure of many real-world causal processes. Explicitly expressing the sparse dependencies among variables provides for various efficient inference and reasoning algorithms, such as message-passing, which apply a common information propagation procedure across localities within a graphical model, resulting in a composable, and partially parallelizable, reasoning procedure which can be applied to graphical models of different sizes and shape.

作为机器学习中关系推理的一个例子,图模型(Pearl,1988; Koller和Friedman,2009)可以通过在随机变量中进行显式随机条件独立来表示复杂的联合分布。这些模型非常成功,因为它们捕获了许多真实世界生成过程的稀疏结构,并且因为它们支持高效的学习和推理算法。例如,隐马尔可夫模型在给定前一时间步的状态的情况下将潜伏状态约束为条件独立于其他状态,并且考虑到当前时间步的潜在状态,观察值是条件独立的,这与以下关系结构完全匹配许多真实世界的因果过程。明确表达变量之间的稀疏依赖关系提供了各种有效的推理和推理算法,例如消息传递,它们在图模型内的各个地方之间应用通用的信息传播过程,从而产生可组合的和部分可并行的推理过程,应用于不同尺寸和形状的图形模型。


图网络

本论文提出的图网络(GN)框架定义了一类对图结构表征进行关系推理的函数。该 GN 框架泛化并扩展了多种图神经网络、MPNN 和 NLNN 方法(Scarselli 等,2009a; Gilmer 等,2017; Wang 等,2018c),并支持从简单的构建模块建立复杂的架构。注意,这里避免了在「图网络」中使用「神经」术语,以反映它可以用函数而不是神经网络来实现,虽然在这里关注的是神经网络实现。

GN 框架的主要计算单元是 GN 模块,这是一个「图到图」的模块,以图为输入,在结构层面上执行计算,并返回一个图作为输出。如 Box3 所示,实体由图节点表征,由边连接,系统级特性由全局属性表征。GN 框架的模块组织强调了可定制性,并能合成可以表达关系归纳偏置的新架构。其关键的设计原则是:灵活的表征;可配置的模块内部结构;以及可组合的多模块框架。

举个例子来比喻 GN 的形式化原则:考虑预测一堆橡胶球在任意引力场中的运动,它们不是互相碰撞,而是通过一个或多个弹簧互相连接。其结构和相互作用对应于 GN 的图表征和计算执行。

Box 3:「图」的定义

这里我们使用「图」来表示具有全局属性、属性化的定向多图。在本文的术语中,节点表示为 v_i,边表示为 e_k,全局属性表示为 u。我们还使用 s_k 和 r_k 分别表示边 k 发送节点和接收节点(见下文)的索引。

更确切地说,这些术语定义为:

  • 定向:单向边,从「发送」节点到「接收」节点。
  • 属性:可以编码为向量、集合甚至其他图形的属性。
  • 属性化:边和顶点具有与其关联的属性。
  • 全局属性:图级属性。
  • 多图:顶点之间可以有多个边,包括自边(self-edge)。

在我们的 GN 框架中,一个「图」被定义为一个 3 元组的G=(u,V,E)。
u 表示一个全局属性;例如,u 可能代表重力场。

是节点集合(基数是Nv),其中每个Vi表示节点的属性。例如,V 可能表示每个球,带有位置、速度和质量这些属性。

是边(基数是Ne)的集合,其中每个ek表示边的属性,rk是接收节点的 index,sk是发送节点的 index。例如,E 可以表示不同球之间存在的弹簧,以及它们对应的弹簧常数。


算法 1:一个完整的 GN block 的计算步骤



GN block 的内部结构

一个 GN block 包含三个 “update” 函数ø,以及三个 “aggregation” 函数ρ:




其中:

图3:GN 区块中的更新。蓝色表示正在更新的元素,黑色表示更新中涉及的其他元素(请注意,更新中也使用蓝色元素表示前更新值)。有关符号的详细信息,请参见等式 1。

论文其他图与表

表 1:标准深度学习组件中的各种关系归纳偏置。详见论文原文第 2 节。

图 1:重复使用和共享常见的深度学习构件。(a)全连接层,其中所有权重都是独立的,没有共享。(b)卷积层,其中局部核函数在输入端被多次使用。共享权重由具有相同颜色的箭头指示。(c)循环层,其中相同的功能在不同的处理步骤中重复使用。

图 2:不同的图表征。(a)一个分子,其中每个原子表示为对应关系的节点和边(Duvenaud 等,2015)。(b)一个质量弹簧系统,其中绳索由在图中表示为节点的质量序列定义(Battaglia 等,2016;Chang 等,2017)。(c)一个 n 主体系统,其中主体是节点,底层图是完全连接的(Battaglia 等,2016 年;Chang 等,2017)。(d)一个精密的主体系统,其中球和壁是节点,底层图形定义球之间以及球和壁之间的相互作用(Battaglia 等,2016 年;Chang 等,2017)。(e)一个句子,其中单词对应于树中的叶子,其他节点和边可以由解析器提供(Socher 等,2013)。或者,可以使用完全连接的图(Vaswani 等,2017 年)。(f)一张图像,可以分解成与完全连接图像中的节点相对应的图像块(Santoro 等,2017;Wang 等,2018)。

难点

要把知识图谱和深度学习相结合,有几大难点。

1. 点向量:

知识图谱由点和边构成,点(node)用来表征实体(entity),实体又包含属性(attribute)和属性的值(value)。传统知识图谱中的实体,通常由概念符号构成,譬如自然语言的词汇。
传统知识图谱中的边,连接两个单点,也就是两个实体,边表达的是关系,关系的强弱,由权重表达,传统知识图谱的边的权重,通常是常数。
如果想把传统知识图谱与深度学习相融合,首先要做的是实现点的可微分化。用数值化的词向量来替代自然语言的词汇,是实现点的可微分化的有效方法,通常的做法是用语言模型来分析大量的文本,给每个词汇找到最贴合上下文语义的词向量。但在图谱中,传统的词向量的生成算法,不十分奏效,需要改造。

2. 超点:

前文说到,传统知识图谱中的边,连接两个单点,表达两个单点之间的关系。这个假定制约了图谱的表达能力,因为在很多场景下,多个单点组合在一起,才与其它单点或者单点组合,存在关系。我们把单点组合,称之为超点(hyper-node)。
问题是哪些单点组合在一起构成超点?人为的先验指定,当然是一个办法。从大量训练数据中,通过 dropout 或者 regulation 算法,自动学习出超点的构成,也是一个思路。

3. 超边:

传统的知识图谱中的边,表达了点与点之间的关系,关系的强弱由权重表达,通常权重是个常数。但在很多场景下,权重并非是常数。随着点的取值不同,边的权重也发生变化,而且很可能是非线性变化。
用非线性函数来表达图谱的边,称为超边(hyper-edge)。
深度学习模型可以用于模拟非线性函数。所以,知识图谱中每条边都是一个深度学习模型。模型的输入是若干个单点组成的超点,模型的输出是另一个超点。如果把每个深度学习模型,视为一棵树,根是输入,叶子是输出。那么鸟瞰整个知识图谱,实际上是深度学习模型的森林。

4. 路径:

训练知识图谱,包括训练点向量,超点、和超边的时候,一条训练数据往往是在图谱中行走的一条路径,通过拟合海量的路径,获得最贴切的点向量、超点和超边。
用拟合路径来训练图谱,存在的一个问题是,训练过程与过程结束后的评价,两者的脱节。打个比方,给你若干篇文章的提纲,以及相应的范文,让你学习如何写作文。拟合的过程,强调逐字逐句的模仿。但是评价文章的好坏,重点并不在于字句的亦步亦趋,而在于通篇文章的顺畅。
如何解决训练过程与最终评价的脱节?很有潜力的办法,是用强化学习。强化学习的精髓,在于把最终的评价,通过回溯和折现的方法,给路径过程中每一个中间状态,评估它的潜力。
但是强化学习面临的困难,在于中间状态的数量不可太多。当状态数量太多时,强化学习的训练过程,无法收敛。解决收敛问题的办法,是用一个深度学习模型,来估算所有状态的潜力值。换句话说,不需要估算所有状态的潜力值,而只需要训练一个模型的有限参数。

DeepMind 之前发表的这篇文章,提议把深度强化学习与知识图谱等相融合,并梳理了大量的相关研究。但是,论文并没有明确说明 DeepMind 偏向于哪一种具体方案。
或许,针对不同应用场景会有不同方案,并没有通用的最佳方案。

(部分引用“新智元”与“机器之心”)



你可能感兴趣的:(深度学习,Relational,inductive,biases,deep,learning,graph,networks,图深度学习,GraphDL)