Adam坤

Relational inductive biases, deep learning, and graph networks（关系归纳偏差、深度学习和图形网络）

Relational inductive biases, deep learning, and graph networks

Peter W. Battaglia1, Jessica B. Hamrick1, Victor Bapst1, Alvaro Sanchez-Gonzalez1, Vinicius Zambaldi1, Mateusz Malinowski1, Andrea Tacchetti1, David Raposo1, Adam Santoro1, Ryan Faulkner1, Caglar Gulcehre1, Francis Song1, Andrew Ballard1, Justin Gilmer2, George Dahl2, Ashish Vaswani2, Kelsey Allen3, Charles Nash4, Victoria Langston1, Chris Dyer1, Nicolas Heess1, Daan Wierstra1, Pushmeet Kohli1, Matt Botvinick1, Oriol Vinyals1, Yujia Li1, Razvan Pascanu1

1DeepMind; 2Google Brain; 3MIT; 4University of Edinburgh

Abstract

Arti cial intelligence (AI) has undergone a renaissance recently, making major progress in key domains such as vision, language, control, and decision-making. This has been due, in part, to cheap data and cheap compute resources, which have t the natural strengths of deep learning. However, many de ning characteristics of human intelligence, which developed under much di erent pressures, remain out of reach for current approaches. In particular, generalizing beyond one’s experiences|a hallmark of human intelligence from infancy|remains a formidable challenge for modern AI. The following is part position paper, part review, and part uni cation. We argue that combinatorial generalization must be a top priority for AI to achieve human-like abilities, and that structured representations and computations are key to realizing this objective. Just as biology uses nature and nurture cooperatively, we reject the false choice between \hand-engineering" and \end-to-end" learning, and instead advocate for an approach which bene ts from their complementary strengths. We explore how using relational inductive biases within deep learning architectures can facilitate learning about entities, relations, and rules for composing them. We present a new building block for the AI toolkit with a strong relational inductive bias|the graph network|which generalizes and extends various approaches for neural networks that operate on graphs, and provides a straightforward interface for manipulating structured knowledge and producing structured behaviors. We discuss how graph networks can support relational reasoning and combinatorial generalization, laying the foundation for more sophisticated, interpretable, and exible patterns of reasoning. As a companion to this paper, we have also released an open-source software library for building graph networks, with demonstrations of how to use them in practice.
最近，人工智能（AI）经历了复兴，在视觉，语言，控制和决策等关键领域取得了重大进展。这部分归功于廉价数据和廉价计算资源，这些资源具有深度学习的自然优势。然而，在许多不同的压力下发展的人类智能的许多定义特征仍然是当前方法无法实现的。特别是，超越一个人的经验|从婴儿时期开始的人类智能标志|仍然是现代人工智能的一项艰巨挑战。以下是部分立场文件，部分审查和部分单元。我们认为组合概括必须是AI实现类似人类能力的首要任务，结构化表示和计算是实现这一目标的关键。就像生物学利用自然和培养合作一样，我们拒绝在“手工工程”和“端到端”学习之间做出错误选择，而是主张从其互补优势中获益的方法。我们探索如何在深度学习架构中使用关系归纳偏差来促进对实体，关系和组成它们的规则的学习。我们为AI工具包提供了一个新的构建块，具有强大的关系归纳偏差|图形网络|它概括和扩展了在图形上运行的神经网络的各种方法，并为操纵结构化知识和生成结构化行为提供了直接的界面。我们讨论图网络如何支持关系推理和组合泛化，为更复杂，可解释和灵活的推理模式奠定基础。作为本文的配套，我们还发布了一个用于构建图形网络的开源软件库，并演示了如何在实践中使用它们。

1 Introduction

A key signature of human intelligence is the ability to make \in nite use of nite means" (Humboldt, 1836; Chomsky, 1965), in which a small set of elements (such as words) can be productively composed in limitless ways (such as into new sentences). This reects the principle of combinatorial generalization, that is, constructing new inferences, predictions, and behaviors from known building blocks. Here we explore how to improve modern AI’s capacity for combinatorial generalization by biasing learning towards structured representations and computations, and in particular, systems that operate on graphs. Humans’ capacity for combinatorial generalization depends critically on our cognitive mecha-nisms for representing structure and reasoning about relations. We represent complex systems as compositions of entities and their interactions1 (Navon, 1977; McClelland and Rumelhart, 1981; Plaut et al., 1996; Marcus, 2001; Goodwin and Johnson-Laird, 2005; Kemp and Tenenbaum, 2008), such as judging whether a haphazard stack of objects is stable (Battaglia et al., 2013). We use hierarchies to abstract away from ne-grained di erences, and capture more general commonalities between representations and behaviors (Botvinick, 2008; Tenenbaum et al., 2011), such as parts of an object, objects in a scene, neighborhoods in a town, and towns in a country. We solve novel problems by composing familiar skills and routines (Anderson, 1982), for example traveling to a new location by composing familiar procedures and objectives, such as \travel by airplane", \to San Diego", \eat at", and \an Indian restaurant". We draw analogies by aligning the relational structure between two domains and drawing inferences about one based on corresponding knowledge about the other (Gentner and Markman, 1997; Hummel and Holyoak, 2003).
人类智慧的一个关键特征就是能够“有效地使用nite手段”（Humboldt，1836; Chomsky，1965），其中一小部分元素（如单词）可以以无限的方式有效地组合（如这反映了组合泛化的原理，即从已知构建块构建新的推论，预测和行为。在这里，我们探讨如何通过将学习偏向于结构化表示和计算来提高现代AI的组合泛化能力。特别是在图表上运作的系统。人类组合概括的能力关键取决于我们表达结构和关系推理的认知机制。我们将复杂系统表示为实体及其相互作用的组合1（Navon，1977; McClelland）和Rumelhart，1981; Plaut等，1996; Marcus，2001; Goodwin和Johnson-Laird，2005; Kemp和Tenenbaum，2008），如ju dging一堆随意的物体是否稳定（Battaglia et al。，2013）。我们使用层次结构从抽象的差异中抽象出来，并在表征和行为之间捕捉更多的一般共性（Botvinick，2008; Tenenbaum等，2011），例如对象的部分，场景中的对象，邻域中的邻域。一个国家的城镇和城镇。我们通过撰写熟悉的技能和惯例来解决新问题（Anderson，1982），例如通过编写熟悉的程序和目标前往新的地点，例如\乘飞机旅行，“飞往圣地亚哥”，“吃”，“吃” \一家印度餐馆“。我们通过对齐两个域之间的关系结构来绘制类比，并根据对另一个域的相应知识得出关于一个域的推论（Gentner和Markman，1997; Hummel和Holyoak，2003）。

Kenneth Craik’s \The Nature of Explanation" (1943), connects the compositional structure of the world to how our internal mental models are organized: …[a human mental model] has a similar relation-structure to that of the process it imitates. By ‘relation-structure’ I do not mean some obscure non-physical entity which attends the model, but the fact that it is a working physical model which works in the same way as the process it parallels… physical reality is built up, apparently, from a few fundamental types of units whose properties determine many of the properties of the most complicated phenomena, and this seems to a ord a su cient explanation of the emergence of analogies between mechanisms and similarities of relation-structure among these combinations without the necessity of any theory of objective universals. (Craik, 1943, page 51-55) That is, the world is compositional, or at least, we understand it in compositional terms. When learning, we either t new knowledge into our existing structured representations, or adjust the structure itself to better accommodate (and make use of) the new and the old (Tenenbaum et al., 2006; Gri ths et al., 2010; Ullman et al., 2017). The question of how to build arti cial systems which exhibit combinatorial generalization has been at the heart of AI since its origins, and was central to many structured approaches, including logic, grammars, classic planning, graphical models, causal reasoning, Bayesian nonparametrics, and probabilistic programming (Chomsky, 1957; Nilsson and Fikes, 1970; Pearl, 1986, 2009; Russell and Norvig, 2009; Hjort et al., 2010; Goodman et al., 2012; Ghahramani, 2015). Entire sub- elds have focused on explicit entity- and relation-centric learning, such as relational reinforcement learning (Dzeroski et al., 2001) and statistical relational learning (Getoor and Taskar, 2007). A key reason why structured approaches were so vital to machine learning in previous eras was, in part, because data and computing resources were expensive, and the improved sample complexity a orded by structured approaches’ strong inductive biases was very valuable. In contrast with past approaches in AI, modern deep learning methods (LeCun et al., 2015; Schmidhuber, 2015; Goodfellow et al., 2016) often follow an \end-to-end" design philosophy which emphasizes minimal a priori representational and computational assumptions, and seeks to avoid explicit structure and \hand-engineering". This emphasis has t well with|and has perhaps been a rmed by|the current abundance of cheap data and cheap computing resources, which make trading o sample e ciency for more exible learning a rational choice. The remarkable and rapid advances across many challenging domains, from image classi cation (Krizhevsky et al., 2012; Szegedy et al., 2017), to natural language processing (Sutskever et al., 2014; Bahdanau et al., 2015), to game play (Mnih et al., 2015; Silver et al., 2016; Moravc k et al., 2017), are a testament to this minimalist principle. A prominent example is from language translation, where sequence-to-sequence approaches (Sutskever et al., 2014; Bahdanau et al., 2015) have proven very e ective without using explicit parse trees or complex relationships between linguistic entities. Despite deep learning’s successes, however, important critiques (Marcus, 2001; Shalev-Shwartz et al., 2017; Lake et al., 2017; Lake and Baroni, 2018; Marcus, 2018a,b; Pearl, 2018; Yuille and Liu, 2018) have highlighted key challenges it faces in complex language and scene understanding, reasoning about structured data, transferring learning beyond the training conditions, and learning from small amounts of experience. These challenges demand combinatorial generalization, and so it is perhaps not surprising that an approach which eschews compositionality and explicit structure struggles to meet them. When deep learning’s connectionist (Rumelhart et al., 1987) forebears were faced with analogous critiques from structured, symbolic positions (Fodor and Pylyshyn, 1988; Pinker and Prince, 1988), there was a constructive e ort (Bobrow and Hinton, 1990; Marcus, 2001) to address the challenges directly and carefully.

Kenneth Craik的《解释的本质》（1943年），将世界的构成结构与我们内部心理模型的组织方式联系起来。…[人类心理模型]与它所模仿的过程有着相似的关系结构。所谓“关系结构”，不是指某个模糊的非物理实体参与模型，而是指它是一个工作的物理模型，其工作方式与它所并行的过程相同。显然，物理现实是由一些基本类型的单位构成的，这些单位的性质决定了最复杂现象的许多性质，这似乎是对机制之间出现的类比和关系结构的相似性的充分解释。在这些组合中，没有任何客观普遍性理论的必要。（Craik，1943，第51-55页）也就是说，世界是组成的，或者至少，我们从组成的角度来理解它。学习时，我们要么将新知识融入现有的结构化表示，要么调整结构本身以更好地适应（并利用）新的和旧的（Tenenbaum等人，2006年；Gri-ths等人，2010年；Ullman等人，2017年）。人工智能起源以来，如何构建具有组合泛化特征的人工系统一直是人工智能的核心问题，也是许多结构化方法的核心问题，包括逻辑、语法、经典规划、图形模型、因果推理、贝叶斯非参数学和Proba等。胆道规划（Chomsky，1957；Nilsson和Fikes，1970；Pearl，1986，2009；Russell和Norvig，2009；Hjort等人，2010；Goodman等人，2012；Gharamani，2015）。整个子ELD都专注于明确的以实体和关系为中心的学习，例如关系强化学习（Dzeroski等人，2001年）和统计关系学习（Getoor和Taskar，2007年）。结构化方法对机器学习如此重要的一个关键原因是，部分原因是数据和计算资源昂贵，并且由结构化方法的强归纳偏差决定的改进的样本复杂性非常有价值。与过去的人工智能方法相比，现代深度学习方法（Lecun等人，2015年；Schmidhuber，2015年；Goodfellow等人，2016年）常常遵循一种“端到端”的设计理念，强调最小的先验表示和计算假设，并试图避免明确的结构。E和\手工工程”。这种强调与不太一致，而且可能是由目前廉价数据和廉价计算资源的丰富性所导致的，这使得交易和样本效率对于更灵活的学习是一种理性的选择。从图像分类（Krizhevsky et al.，2012；Szegedy et al.，2017）到自然语言处理（Sutskever et al.，2014；Bahdanau et al.，2015），到游戏游戏（Mnih et al.，2015；Silver et al.，2016；Moravc K et al.，2017），许多具有挑战性的领域都取得了显著和快速的进展。是这个极简主义原则的证明。一个突出的例子是语言翻译，其中序列到序列的方法（Sutskever等人，2014年；Bahdanau等人，2015年）证明非常有效，不使用显式的解析树或语言实体之间的复杂关系。然而，尽管深入学习取得了成功，但重要的批评（Marcus，2001；Shalev Shwartz等人，2017；Lake等人，2017；Lake and Baroni，2018；Marcus，2018a，B；Pearl，2018；Yuille和Liu，2018）突出了其在复杂语言和场景理解、推理方面面临的关键挑战。关于结构化数据，将学习转移到培训条件之外，并从少量经验中学习。这些挑战需要组合泛化，因此，一种避免组合性和显式结构的方法难以满足这些挑战，这也许并不奇怪。当深度学习的连接主义者（Rumelhart等人，1987年）的祖先面临来自结构化、象征性立场的类似批评时（Fodor和Pylyshyn，1988年；Pinker和Prince，1988年），有一个建设性的目标（Bobrow和Hinton，1990年；Marcus，2001年），以应对Direc面临的挑战。小心地。

A variety of innovative sub-symbolic approaches for representing and reasoning about structured objects were developed in domains such as analogy-making, linguistic analysis, symbol manipulation, and other forms of relational reasoning (Smolensky, 1990; Hinton, 1990; Pollack, 1990; Elman, 1991; Plate, 1995; Eliasmith, 2013), as well as more integrative theories for how the mind works (Marcus, 2001). Such work also helped cultivate more recent deep learning advances which use distributed, vector representations to capture rich semantic content in text (Mikolov et al., 2013; Pennington et al., 2014), graphs (Narayanan et al., 2016, 2017), algebraic and logical expressions (Allamanis et al., 2017; Evans et al., 2018), and programs (Devlin et al., 2017; Chen et al., 2018b). We suggest that a key path forward for modern AI is to commit to combinatorial generalization as a top priority, and we advocate for integrative approaches to realize this goal. Just as biology does not choose between nature versus nurture|it uses nature and nurture jointly, to build wholes which are greater than the sums of their parts|we, too, reject the notion that structure and exibility are somehow at odds or incompatible, and embrace both with the aim of reaping their complementary strengths. In the spirit of numerous recent examples of principled hybrids of structure-based methods and deep learning (e.g., Reed and De Freitas, 2016; Garnelo et al., 2016; Ritchie et al., 2016; Wu et al., 2017; Denil et al., 2017; Hudson and Manning, 2018), we see great promise in synthesizing new techniques by drawing on the full AI toolkit and marrying the best approaches from today with those which were essential during times when data and computation were at a premium. Recently, a class of models has arisen at the intersection of deep learning and structured approaches, which focuses on approaches for reasoning about explicitly structured data, in particular graphs (e.g. Scarselli et al., 2009b; Bronstein et al., 2017; Gilmer et al., 2017; Wang et al., 2018c; Li et al., 2018; Kipf et al., 2018; Gulcehre et al., 2018). What these approaches all have in common is a capacity for performing computation over discrete entities and the relations between them. What sets them apart from classical approaches is how the representations and structure of the entities and relations|and the corresponding computations|can be learned, relieving the burden of needing to specify them in advance. Crucially, these methods carry strong relational inductive biases, in the form of speci c architectural assumptions, which guide these approaches towards learning about entities and relations (Mitchell, 1980), which we, joining many others (Spelke et al., 1992; Spelke and Kinzler, 2007; Marcus, 2001; Tenenbaum et al., 2011; Lake et al., 2017; Lake and Baroni, 2018; Marcus, 2018b), suggest are an essential ingredient for human-like intelligence.
在类比、语言分析、符号操作和其他形式的关系推理等领域，开发了各种创新的亚符号方法来表示和推理结构化对象（Smolensky，1990年；Hinton，1990年；Pollack，1990年；Elman，1991年；PLATe，1995年；Eliasmith，2013年），以及关于大脑如何工作的更多综合理论（Marcus，2001年）。这项工作也有助于培养更新的深度学习进展，即使用分布式矢量表示来捕获文本中丰富的语义内容（Mikolov等人，2013年；Pennington等人，2014年）、图形（Narayanan等人，2016年、2017年）、代数和逻辑表达式（Allamanis等人，2017年）。；Evans等人，2018年）和项目（Devlin等人，2017年；Chen等人，2018b）。我们认为，现代人工智能的一个重要前进方向是将组合泛化作为首要任务，并提倡采用综合方法来实现这一目标。正如生物学不在自然与后天之间做出选择一样，它共同利用自然与后天，来构建大于各部分总和的整体，我们也拒绝了结构与存在性在某种程度上是不一致或不相容的概念，并接受两者以获得它们的目的。R互补优势。本着结构方法和深度学习的原则性混合的众多近期例子（如Reed和De Freitas，2016；Garnelo等人，2016；Ritchie等人，2016；Wu等人，2017；Denil等人，2017；Hudson和Manning，2018）的精神，我们看到了合成新技术的巨大希望。利用完整的人工智能工具包，将当今最好的方法与那些在数据和计算费用高昂的时候必不可少的方法结合起来。最近，一类模型出现在深度学习和结构化方法的交叉点上，侧重于对显式结构化数据进行推理的方法，特别是图表（例如，Scarselli等人，2009年b；Bronstein等人，2017年；Gilmer等人，2017年；Wang等人，2018c；Li等人，2018年；Kipf等人，2018年；Gulcehre等人，2018年）。这些方法的共同点是能够对离散实体及其之间的关系执行计算。与传统方法不同的是，如何学习实体和关系的表示和结构以及相应的计算，从而减轻提前指定它们的负担。至关重要的是，这些方法带有强烈的关系归纳偏差，以特定的体系结构假设的形式，指导这些方法学习实体和关系（Mitchell，1980），我们加入了许多其他方法（Spelke等人，1992；Spelke和Kinzler，2007；Marcus，2001；Tenenbaum等人，2011年；Lake等人，2017年；Lake and Baroni，2018年；Marcus，2018b），建议是类人智能的重要组成部分。

In the remainder of the paper, we examine various deep learning methods through the lens of their relational inductive biases, showing that existing methods often carry relational assumptions which are not always explicit or immediately evident. We then present a general framework for entity- and relation-based reasoning|which we term graph networks|for unifying and extending existing methods which operate on graphs, and describe key design principles for building powerful architectures using graph networks as building blocks. We have also released an open-source library for building graph networks, which can be found here: github.com/deepmind/graph nets.

在本文的其余部分中，我们从各种深层学习方法的关系归纳偏差的角度对它们进行了研究，结果表明，现有的方法通常带有关系假设，这些假设并不总是明确的或立即明显的。然后，我们提出了一个基于实体和关系的推理的一般框架，我们称之为图网络，用于统一和扩展对图进行操作的现有方法，并描述了使用图网络作为构建块构建强大架构的关键设计原则。我们还发布了一个用于构建图形网络的开源库，可以在这里找到：github.com/deepmind/graph nets。

Box 1: Relational reasoning We de ne structure as the product of composing a set of known building blocks. \Structured representations" capture this composition (i.e., the arrangement of the elements) and \structured computations" operate over the elements and their composition as a whole. Relational reasoning, then, involves manipulating structured representations of entities and relations, using rules for how they can be composed. We use these terms to capture notions from cognitive science, theoretical computer science, and AI, as follows: An entity is an element with attributes, such as a physical object with a size and mass. A relation is a property between entities. Relations between two objects might include same size as, heavier than, and distance from. Relations can have attributes as well. The relation more than X times heavier than takes an attribute, X, which determines the relative weight threshold for the relation to be true vs. false. Relations can also be sensitive to the global context. For a stone and a feather, the relation falls with greater acceleration than depends on whether the context is in air vs. in a vacuum. Here we focus on pairwise relations between entities. A rule is a function (like a non-binary logical predicate) that maps entities and relations to other entities and relations, such as a scale comparison like is entity X large? and is entity X heavier than entity Y?. Here we consider rules which take one or two arguments (unary and binary), and return a unary property value. As an illustrative example of relational reasoning in machine learning, graphical models (Pearl, 1988; Koller and Friedman, 2009) can represent complex joint distributions by making explicit random conditional independences among random variables. Such models have been very successful because they capture the sparse structure which underlies many real-world generative processes and because they support e cient algorithms for learning and reasoning. For example, hidden Markov models constrain latent states to be conditionally independent of others given the state at the previous time step, and observations to be conditionally independent given the latent state at the current time step, which are well-matched to the relational structure of many real-world causal processes. Explicitly expressing the sparse dependencies among variables provides for various e cient inference and reasoning algorithms, such as message-passing, which apply a common information propagation procedure across localities within a graphical model, resulting in a composable, and partially parallelizable, reasoning procedure which can be applied to graphical models of di erent sizes and shape.

框1：关系推理我们将一个结构定义为组成一组已知构建块的产物。\结构化表示“捕获此组合（即元素的排列）和结构化计算”对元素及其整体组合进行操作。然后，关系推理涉及到操纵实体和关系的结构化表示，使用它们如何组成的规则。我们使用这些术语从认知科学、理论计算机科学和人工智能中获取概念，如下所示：实体是具有属性的元素，例如具有大小和质量的物理对象。关系是实体之间的属性。两个物体之间的关系可能包括相同的大小、重量和距离。关系也可以有属性。关系的权重大于x倍，它采用一个属性x，该属性决定关系的相对权重阈值为真与假。关系也可以对全球环境敏感。对于一块石头和一根羽毛来说，它们之间的关系下降的加速度比取决于环境是在空气中还是在真空中。在这里，我们关注的是实体之间的成对关系。规则是将实体和关系映射到其他实体和关系的函数（类似于非二进制逻辑谓词），例如，规模比较，例如，实体x是否大？实体x比实体y重吗？。在这里，我们考虑采用一个或两个参数（一元和二进制）并返回一元属性值的规则。作为机器学习中关系推理的一个示例，图形模型（Pearl，1988；Koller和Friedman，2009）可以通过在随机变量之间建立明确的随机条件独立性来表示复杂的联合分布。这些模型之所以非常成功，是因为它们捕获了构成许多现实世界生成过程的稀疏结构，并且支持有效的学习和推理算法。例如，隐马尔可夫模型将潜在状态约束为与前一时间步的状态有条件地独立于其他状态，而观测值则约束为与当前时间步的潜在状态有条件地独立于其他状态，这与许多现实世界的因果过程。显式地表示变量之间的稀疏依赖性提供了各种有效的推理和推理算法，例如消息传递，它在图形模型中跨区域应用通用的信息传播过程，从而产生可组合的部分Paralle可调整的推理过程，可应用于不同尺寸和形状的图形模型。

2 Relational inductive biases

Many approaches in machine learning and AI which have a capacity for relational reasoning (Box 1) use a relational inductive bias. While not a precise, formal de nition, we use this term to refer generally to inductive biases (Box 2) which impose constraints on relationships and interactions among entities in a learning process. Creative new machine learning architectures have rapidly proliferated in recent years, with (perhaps not surprisingly given the thesis of this paper) practitioners often following a design pattern of composing elementary building blocks to form more complex, deep2 computational hierarchies and graphs3. Building blocks such as \fully connected" layers are stacked into \multilayer perceptrons" (MLPs), \convolutional layers" are stacked into \convolutional neural networks" (CNNs), and a standard recipe for an image processing network is, generally, some variety of CNN composed with a MLP. This composition of layers provides a particular type of relational inductive bias|that of hierarchical processing|in which computations are performed in stages, typically resulting in increasingly long range interactions among information in the input signal. As we explore below, the building blocks themselves also carry various relational inductive biases (Table 1). Though beyond the scope of this paper, various non-relational inductive biases are used in deep learning as well: for example, activation non-linearities, weight decay, dropout (Srivastava et al., 2014), batch and layer normalization (Io e and Szegedy, 2015; Ba et al., 2016), data augmentation, training curricula, and optimization algorithms all impose constraints on the trajectory and outcome of learning.
在机器学习和人工智能中，许多具有关系推理能力的方法（框1）使用关系归纳偏差。虽然不是一个精确的正式定义，但我们使用这个术语来泛指归纳偏见（框注2），它对学习过程中实体之间的关系和互动施加限制。近年来，创造性的新机器学习体系结构迅速发展，从业者们（本文的论题可能并不奇怪）经常遵循一种设计模式，即组成基本的构建块，以形成更复杂的DEEP2计算层次结构和图形。高铁3号。像完全连接的“层”这样的构建块被堆叠成多层感知器（MLP），“卷积层”被堆叠成卷积神经网络（CNN），图像处理网络的标准配方通常是由一个MLP组成的CNN的一些变体。层的这种组合提供了一种特定类型的关系归纳偏差，即分层处理的关系归纳偏差，在这种偏差中，计算是分阶段进行的，通常会导致输入信号中信息之间的交互作用越来越长。正如我们下面所探讨的，构建块本身也带有各种关系归纳偏差（表1）。虽然超出了本文的范围，但在深度学习中也使用了各种非关系归纳偏差：例如，激活非线性、权重衰减、辍学（srivastava等人，2014）、批和层规范化（io e和szegedy，2015；ba等人，2016）、数据增强、传输培训课程和优化算法都对学习的轨迹和结果施加了限制。

Box 2: Inductive biases Learning is the process of apprehending useful knowledge by observing and interacting with the world. It involves searching a space of solutions for one expected to provide a better explanation of the data or to achieve higher rewards. But in many cases, there are multiple solutions which are equally good (Goodman, 1955). An inductive bias allows a learning algorithm to prioritize one solution (or interpretation) over another, independent of the observed data (Mitchell, 1980). In a Bayesian model, inductive biases are typically expressed through the choice and parameterization of the prior distribution (Gri ths et al., 2010). In other contexts, an inductive bias might be a regularization term (McClelland, 1994) added to avoid over tting, or it might be encoded in the architecture of the algorithm itself. Inductive biases often trade exibility for improved sample complexity and can be understood in terms of the bias-variance tradeo (Geman et al., 1992). Ideally, inductive biases both improve the search for solutions without substantially diminishing performance, as well as help nd solutions which generalize in a desirable way; however, mismatched inductive biases can also lead to suboptimal performance by introducing constraints that are too strong. Inductive biases can express assumptions about either the data-generating process or the space of solutions. For example, when tting a 1D function to data, linear least squares follows the constraint that the approximating function be a linear model, and approximation errors should be minimal under a quadratic penalty. This reects an assumption that the data-generating process can be explained simply, as a line process corrupted by additive Gaussian noise. Similarly, L2 regularization prioritizes solutions whose parameters have small values, and can induce unique solutions and global structure to otherwise ill-posed problems. This can be interpreted as an assumption about the learning process: that searching for good solutions is easier when there is less ambiguity among solutions. Note, these assumptions need not be explicit|they reect interpretations of how a model or algorithm interfaces with the world.

框注2：归纳偏见学习是通过观察和与世界互动来获取有用知识的过程。它包括搜索一个解决方案空间，寻找能够更好地解释数据或获得更高回报的解决方案。但在许多情况下，也有许多同样好的解决方案（Goodman，1955年）。归纳偏差允许学习算法将一个解决方案（或解释）优先于另一个解决方案（或解释），与观察数据无关（Mitchell，1980）。在贝叶斯模型中，归纳偏差通常通过先验分布的选择和参数化来表示（Gri Ths等人，2010）。在其他情况下，归纳偏差可能是为了避免过度转换而添加的一个正则化术语（McClelland，1994），或者它可能被编码到算法本身的体系结构中。归纳偏差通常是为了提高样本复杂性而进行的交易存在性，可以通过偏差方差交易来理解（Geman等人，1992年）。理想情况下，归纳偏差既能在不大幅降低性能的情况下改进对解决方案的搜索，也能帮助以理想方式概括的nd解决方案；但是，不匹配的归纳偏差也会通过引入过于严格的约束而导致次优性能。g.归纳偏差可以表示关于数据生成过程或解决方案空间的假设。例如，当将一维函数设置为数据时，线性最小二乘法遵循近似函数为线性模型的约束，并且在二次惩罚下，近似误差应最小。这反映了一个假设，即数据生成过程可以简单地解释为被加性高斯噪声破坏的线性过程。同样，二阶正则化优先处理参数值较小的解，并能使不适定问题得到唯一解和全局结构。这可以解释为关于学习过程的一个假设：当解决方案之间的歧义较少时，寻找好的解决方案更容易。注意，这些假设不需要是明确的，它们重新解释了模型或算法如何与世界交互。

To explore the relational inductive biases expressed within various deep learning methods, we must identify several key ingredients, analogous to those in Box 1: what are the entities, what are the relations, and what are the rules for composing entities and relations, and computing their implications? In deep learning, the entities and relations are typically expressed as distributed representations, and the rules as neural network function approximators; however, the precise forms of the entities, relations, and rules vary between architectures. To understand these di erences between architectures, we can further ask how each supports relational reasoning by probing:
为了探索各种深层学习方法中表达的关系归纳偏差，我们必须确定几个关键要素，类似于方框1中的要素：实体是什么，关系是什么，组成实体和关系的规则是什么，以及计算它们的隐含性。恩？在深度学习中，实体和关系通常表示为分布式表示，而规则则表示为神经网络函数逼近器；然而，实体、关系和规则的精确形式因体系结构而异。为了理解架构之间的这些差异，我们可以通过探测进一步询问每个架构如何支持关系推理：

The arguments to the rule functions (e.g., which entities and relations are provided as input).
How the rule function is reused, or shared, across the computational graph (e.g., across di erent entities and relations, across di erent time or processing steps, etc.).
How the architecture de nes interactions versus isolation among representations (e.g., by applying rules to draw conclusions about related entities, versus processing them separately).
规则函数的参数（例如，哪些实体和关系作为输入提供）。
如何在计算图中重用或共享规则函数（例如，跨不同实体和关系、跨不同时间或处理步骤等）。
体系结构如何定义表示之间的交互和隔离（例如，通过应用规则得出有关相关实体的结论，而不是单独处理它们）。

2.1 Relational inductive biases in standard deep learning building blocks

2.1.1 Fully connected layers

Perhaps the most common building block is a fully connected layer (Rosenblatt, 1961). Typically implemented as a non-linear vector-valued function of vector inputs, each element, or \unit", of the output vector is the dot product between a weight vector, followed by an added bias term, and nally a non-linearity such as a recti ed linear unit (ReLU). As such, the entities are the units in the network, the relations are all-to-all (all units in layer i are connected to all units in layer j), and the rules are speci ed by the weights and biases. The argument to the rule is the full input signal, there is no reuse, and there is no isolation of information (Figure 1a). The implicit relational inductive bias in a fully connected layer is thus very weak: all input units can interact to determine any output unit’s value, independently across outputs (Table 1).

也许最常见的构建块是完全连接的层（Rosenblatt，1961）。通常作为向量输入的非线性向量值函数来实现，输出向量的每个元素或单位是权重向量之间的点积，后面跟着一个附加的偏倚项，最后是一个非线性，例如直线单位（relu）。因此，实体是网络中的单元，关系都是到所有的（第一层中的所有单元都连接到第j层中的所有单元），规则由权重和偏差指定。规则的参数是完整的输入信号，没有重用，也没有信息隔离（图1a）。因此，完全连接层中的隐式关系电感偏差非常弱：所有输入单元都可以相互作用，独立地跨输出确定任何输出单元的值（表1）。

2.1.2 Convolutional layers

Another common building block is a convolutional layer (Fukushima, 1980; LeCun et al., 1989). It is implemented by convolving an input vector or tensor with a kernel of the same rank, adding a bias term, and applying a point-wise non-linearity. The entities here are still individual units (or grid elements, e.g. pixels), but the relations are sparser. The di erences between a fully connected layer and a convolutional layer impose some important relational inductive biases: locality and translation invariance (Figure 1b). Locality reects that the arguments to the relational rule are those entities in close proximity with one another in the input signal’s coordinate space, isolated from distal entities. Translation invariance reects reuse of the same rule across localities in the input. These biases are very e ective for processing natural image data because there is high covariance within local neighborhoods, which diminishes with distance, and because the statistics are mostly stationary across an image (Table 1).

另一个常见的构建块是卷积层（Fukushima，1980；Lecun等人，1989）。它是通过将一个输入向量或张量与一个相同等级的核卷积，添加一个偏压项，并应用一个点态非线性来实现的。这里的实体仍然是单独的单位（或网格元素，例如像素），但关系更为稀疏。完全连接层和卷积层之间的差异产生了一些重要的关系归纳偏差：位置和平移不变性（图1b）。局域性表明，关系规则的参数是那些在输入信号的坐标空间中彼此靠近、与远端实体隔离的实体。翻译不变性反映了相同规则在输入中跨区域的重用。这些偏差对于处理自然图像数据是非常有意义的，因为当地社区内的协方差很高，随着距离的增加而减小，而且统计数据在图像中大部分是平稳的（表1）。

Figure 1: Reuse and sharing in common deep learning building blocks. (a) Fully connected layer, in which all weights are independent, and there is no sharing. (b) Convolutional layer, in which a local kernel function is reused multiple times across the input. Shared weights are indicated by arrows with the same color. © Recurrent layer, in which the same function is reused across di erent processing steps.
图1：在通用的深度学习构建块中重用和共享。（a）完全连接层，其中所有权重都是独立的，没有共享。（b）卷积层，在卷积层中，本地内核函数在输入端重复使用多次。共享权重由具有相同颜色的箭头表示。（c）循环层，其中相同的功能在不同的处理步骤中重复使用。

2.1.3 Recurrent layers

A third common building block is a recurrent layer (Elman, 1990), which is implemented over a sequence of steps. Here, we can view the inputs and hidden states at each processing step as the entities, and the Markov dependence of one step’s hidden state on the previous hidden state and the current input, as the relations. The rule for combining the entities takes a step’s inputs and hidden state as arguments to update the hidden state. The rule is reused over each step (Figure 1c), which reects the relational inductive bias of temporal invariance (similar to a CNN’s translational invariance in space). For example, the outcome of some physical sequence of events should not depend on the time of day. RNNs also carry a bias for locality in the sequence via their Markovian structure (Table 1).
第三个常见的构建块是循环层（Elman，1990），它是通过一系列步骤实现的。在这里，我们可以把每个处理步骤的输入和隐藏状态看作实体，把一个步骤的隐藏状态对前一个隐藏状态和当前输入的马尔可夫依赖看作关系。合并实体的规则采用步骤的输入和隐藏状态作为参数来更新隐藏状态。该规则在每个步骤中都被重复使用（图1c），它反映了时间不变性的关系归纳偏差（类似于CNN在空间中的平移不变性）。例如，某些物理事件序列的结果不应依赖于一天中的时间。RNN还通过其马尔可夫结构（表1）对序列中的位置具有偏差。

2.2 Computations over sets and graphs

While the standard deep learning toolkit contains methods with various forms of relational inductive biases, there is no \default" deep learning component which operates on arbitrary relational structure. We need models with explicit representations of entities and relations, and learning algorithms which nd rules for computing their interactions, as well as ways of grounding them in data. Importantly, entities in the world (such as objects and agents) do not have a natural order; rather, orderings can be de ned by the properties of their relations. For example, the relations between the sizes of a set of objects can potentially be used to order them, as can their masses, ages, toxicities, and prices. Invariance to ordering|except in the face of relations|is a property that should ideally be reected by a deep learning component for relational reasoning.

Sets are a natural representation for systems which are described by entities whose order is unde ned or irrelevant; in particular, their relational inductive bias does not come from the presence of something, but rather from the absence. For illustration, consider the task of predicting the center
虽然标准的深度学习工具包包含具有各种形式的关系归纳偏差的方法，但是没有在任意关系结构上运行的“深度学习组件”。我们需要具有实体和关系的显式表示的模型，以及用于计算其交互作用的学习算法和规则，以及将它们置于数据中的方法。重要的是，世界上的实体（如对象和代理）没有自然的秩序；相反，秩序可以由它们关系的属性来决定。例如，一组物体的大小之间的关系可以潜在地用于排序，它们的质量、年龄、毒性和价格也可以。除了面对关系之外，对排序不变性是一种属性，理想情况下，这种属性应该由关系推理的深层学习组件实现。
集合是系统的一种自然表示，由顺序不确定或不相关的实体来描述；特别是，它们的关系归纳偏差不是来自于某个事物的存在，而是来自于不存在。例如，考虑预测中心的任务

Graphs, generally, are a representation which supports arbitrary (pairwise) relational struc-ture, and computations over graphs a ord a strong relational inductive bias beyond that which convolutional and recurrent layers can provide.
一般来说，图是一种支持任意（成对）关系结构的表示，并且对图的计算是一种强大的关系归纳偏差，超过了卷积层和循环层所能提供的偏差。

Figure 2: Di erent graph representations. (a) A molecule, in which each atom is represented as a node and edges correspond to bonds (e.g. Duvenaud et al., 2015). (b) A mass-spring system, in which the rope is de ned by a sequence of masses which are represented as nodes in the graph (e.g. Battaglia et al., 2016; Chang et al., 2017). © A n-body system, in which the bodies are nodes and the underlying graph is fully connected (e.g. Battaglia et al., 2016; Chang et al., 2017). (d) A rigid body system, in which the balls and walls are nodes, and the underlying graph de nes interactions between the balls and between the balls and the walls (e.g. Battaglia et al., 2016; Chang et al., 2017). (e) A sentence, in which the words correspond to leaves in a tree, and the other nodes and edges could be provided by a parser (e.g. Socher et al., 2013). Alternately, a fully connected graph could be used (e.g. Vaswani et al., 2017). (f) An image, which can be decomposed into image patches corresponding to nodes in a fully connected graph (e.g. Santoro et al., 2017; Wang et al., 2018c).

图2：不同的图形表示。（a）一种分子，其中每个原子表示为一个节点，边缘对应于键（如Duvenaud等人，2015）。（b）一个质量-弹簧系统，其中绳索由一系列的质量（如Battaglia等人，2016年；Chang等人，2017年）组成，这些质量在图中表示为节点。（c）n体系统，其中体为节点，底层图形完全连接（例如Battaglia等人，2016年；Chang等人，2017年）。（d）刚性体系统，其中球和壁为节点，以及球之间以及球和壁之间NES相互作用的基础图（例如Battaglia等人，2016年；Chang等人，2017年）。（e）一个句子，其中单词对应于树中的叶子，其他节点和边可以由解析器提供（例如，Socher等人，2013）。或者，可以使用完全连接的图（例如Vaswani等人，2017）。（f）一幅图像，可以分解成与完全连接图中的节点对应的图像补丁（例如Santro等人，2017；Wang等人，2018c）。

3 Graph networks

Neural networks that operate on graphs, and structure their computations accordingly, have been developed and explored extensively for more than a decade under the umbrella of \graph neural networks" (Gori et al., 2005; Scarselli et al., 2005, 2009a; Li et al., 2016), but have grown rapidly in scope and popularity in recent years. We survey the literature on these methods in the next sub-section (3.1). Then in the remaining sub-sections, we present our graph networks framework, which generalizes and extends several lines of work in this area.

在图神经网络的保护下，10多年来广泛开发和探索了基于图的神经网络，并相应地构造了它们的计算（Gori等人，2005年；Scarselli等人，2005年、2009年a；Li等人，2016年），但在SCOP中增长迅速。E和近年来的流行。我们将在下一小节（3.1）中调查有关这些方法的文献。然后在剩下的小节中，我们介绍了我们的图形网络框架，它概括和扩展了这方面的几行工作。

3.1 Background

Models in the graph neural network family (Gori et al., 2005; Scarselli et al., 2005, 2009a; Li et al., 2016) have been explored in a diverse range of problem domains, across supervised, semi-supervised, unsupervised, and reinforcement learning settings. They have been e ective at tasks thought to have rich relational structure, such as visual scene understanding tasks (Raposo et al., 2017; Santoro et al., 2017) and few-shot learning (Garcia and Bruna, 2018). They have also been used to learn the dynamics of physical systems (Battaglia et al., 2016; Chang et al., 2017; Watters et al., 2017; van Steenkiste et al., 2018; Sanchez-Gonzalez et al., 2018) and multi-agent systems (Sukhbaatar et al., 2016; Hoshen, 2017; Kipf et al., 2018), to reason about knowledge graphs (Bordes et al., 2013; O~noro-Rubio et al., 2017; Hamaguchi et al., 2017), to predict the chemical properties of molecules (Duvenaud et al., 2015; Gilmer et al., 2017), to predict tra c on roads (Li et al., 2017; Cui et al., 2018), to classify and segment images and videos (Wang et al., 2018c; Hu et al., 2017) and 3D meshes and point clouds (Wang et al., 2018d), to classify regions in images (Chen et al., 2018a), to perform semi-supervised text classi cation (Kipf and Welling, 2017), and in machine translation (Vaswani et al., 2017; Shaw et al., 2018; Gulcehre et al., 2018). They have been used within both model-free (Wang et al., 2018b) and model-based (Hamrick et al., 2017; Pascanu et al., 2017; Sanchez-Gonzalez et al., 2018) continuous control, for model-free reinforcement learning (Hamrick et al., 2018; Zambaldi et al., 2018), and for more classical approaches to planning (Toyer et al., 2017).
图神经网络家族中的模型（Gori等人，2005年；Scarselli等人，2005年，2009年；Li等人，2016年）已经在不同的问题领域中进行了探索，涉及有监督、半监督、无监督和强化学习设置。他们专注于被认为具有丰富关系结构的任务，例如视觉场景理解任务（Raposo等人，2017年；Santro等人，2017年）和少量镜头学习（Garcia和Bruna，2018年）。它们还被用于学习物理系统的动力学（Battaglia等人，2016；Chang等人，2017；Watters等人，2017；van Steenkiste等人，2018；Sanchez Gonzalez等人，2018）和多代理系统（Sukhbaatar等人，2016；Hoshen，2017；Kipf等人，2018），以了解know边缘图（Bordes et al.，2013；O~ Noro Rubio et al.，2017；Hamaguchi et al.，2017），预测分子的化学性质（Duvenaud et al.，2015；Gilmer et al.，2017），预测道路上的tra c（Li et al.，2017；Cui et al.，2018），对图像和视频进行分类和分段（Wang et al.，2018c；Hu等人，2017）和3D网格和点云（Wang等人，2018d），对图像中的区域进行分类（Chen等人，2018a），执行半监督文本分类（Kipf和Welling，2017），以及机器翻译（Vaswani等人，2017；Shaw等人，2018；Gulcehre等人，2018）。它们被用于无模型（Wang等人，2018b）和基于模型（Hamrick等人，2017年；Pascanu等人，2017年；Sanchez-Gonzalez等人，2018年）的连续控制，用于无模型强化学习（Hamrick等人，2018年；Zambaldi等人，2018年），以及用于更经典的P兰宁（Toyer等人，2017年）。

Many traditional computer science problems, which involve reasoning about discrete entities and structure, have also been explored with graph neural networks, such as combinatorial optimization (Bello et al., 2016; Nowak et al., 2017; Dai et al., 2017), boolean satis ability (Selsam et al., 2018), program representation and veri cation (Allamanis et al., 2018; Li et al., 2016), modeling cellular automata and Turing machines (Johnson, 2017), and performing inference in graphical models (Yoon et al., 2018). Recent work has also focused on building generative models of graphs (Li et al., 2018; De Cao and Kipf, 2018; You et al., 2018; Bojchevski et al., 2018), and unsupervised learning of graph embeddings (Perozzi et al., 2014; Tang et al., 2015; Grover and Leskovec, 2016; Garc a-Duran and Niepert, 2017). The works cited above are by no means an exhaustive list, but provide a representative cross-section of the breadth of domains for which graph neural networks have proven useful. We point interested readers to a number of existing reviews which examine the body of work on graph neural networks in more depth. In particular, Scarselli et al. (2009a) provides an authoritative overview of early graph neural network approaches. Bronstein et al. (2017) provides an excellent survey of deep learning on non-Euclidean data, and explores graph neural nets, graph convolution networks, and related spectral approaches. Recently, Gilmer et al. (2017) introduced the message-passing neural network (MPNN), which uni ed various graph neural network and graph convolutional network approaches (Monti et al., 2017; Bruna et al., 2014; Hena et al., 2015; De errard et al., 2016; Niepert et al., 2016; Kipf and Welling, 2017; Bronstein et al., 2017) by analogy to message-passing in graphical models. In a similar vein, Wang et al. (2018c) introduced the non-local neural network (NLNN), which uni ed various \self-attention"-style methods (Vaswani et al., 2017; Hoshen, 2017; Velickovi et al., 2018) by analogy to methods from computer vision and graphical models for capturing long range dependencies in signals.

许多传统的计算机科学问题，包括对离散实体和结构的推理，也被图论神经网络所探索，如组合优化（Bello等人，2016；Nowak等人，2017；Dai等人，2017）、布尔满足能力（Selsam等人，2018）。程序表示和验证（Allamanis等人，2018年；Li等人，2016年）、细胞自动机和图灵机器建模（Johnson，2017年）以及图形模型中的推理（Yoon等人，2018年）。最近的工作还侧重于构建图形的生成模型（Li等人，2018；De Cao和Kipf，2018；You等人，2018；Bojchevski等人，2018），以及无监督的图形嵌入学习（Perozzi等人，2014；Tang等人，2015；Grover和Leskovec，2016；Garc A-Duran和Niepert，2017。）。上面所引用的工作绝不是一个详尽的列表，而是提供了一个代表性的领域宽度的横截面，对于这个领域，图神经网络已经证明是有用的。我们向感兴趣的读者指出了一些现有的评论，这些评论更深入地研究了图形神经网络的工作。尤其是，Scarselli等人（2009a）提供了早期图形神经网络方法的权威概述。布朗斯坦等人（2017）对非欧几里得数据的深度学习进行了出色的调查，探索了图形神经网络、图形卷积网络和相关的光谱方法。最近，Gilmer等人（2017）介绍了消息传递神经网络（MPNN），它统一了各种图形神经网络和图形卷积网络方法（Monti等人，2017；Bruna等人，2014；Hena等人，2015；De Errard等人，2016；Niepert等人，2016；Kipf和Welling，2017；Bronstein等人，2017）。类似于图形模型中的消息传递。在类似的情况下，Wang等人（2018c）介绍了非局域神经网络（NLNN），通过与计算机视觉和图形模型中捕获信号长距离依赖性的方法进行类比，统一了各种“自我关注”式方法（Vaswani等人，2017年；Hoshen，2017年；Velickovi等人，2018年）。

3.2 Graph network (GN) block We now present our graph networks (GN) framework, which de nes a class of functions for relational reasoning over graph-structured representations. Our GN framework generalizes and extends various graph neural network, MPNN, and NLNN approaches (Scarselli et al., 2009a; Gilmer et al., 2017; Wang et al., 2018c), and supports constructing complex architectures from simple building blocks. Note, we avoided using the term \neural" in the \graph network" label to reect that they can be implemented with functions other than neural networks, though here our focus is on neural network implementations.The main unit of computation in the GN framework is the GN block, a \graph-to-graph" module which takes a graph as input, performs computations over the structure, and returns a graph as output. As described in Box 3, entities are represented by the graph’s nodes, relations by the edges, and system-level properties by global attributes. The GN framework’s block organization emphasizes customizability and synthesizing new architectures which express desired relational inductive biases. The key design principles are: Flexible representations (see Section 4.1); Con gurable within-block structure (see Section 4.2); and Composable multi-block architectures (see Section 4.3). We introduce a motivating example to help make the GN formalism more concrete. Consider predicting the movements a set of rubber balls in an arbitrary gravitational eld, which, instead of bouncing against one another, each have one or more springs which connect them to some (or all) of the others. We will refer to this running example throughout the de nitions below, to motivate the graph representation and the computations operating over it. Figure 2 depicts some other common scenarios that can be represented by graphs and reasoned over using graph networks.
3.2图网络（gn）块我们现在介绍了我们的图网络（gn）框架，它为基于图结构表示的关系推理设计了一类函数。我们的gn框架概括和扩展了各种图形神经网络、mpnn和nlnn方法（scarselli等人，2009a；gilmer等人，2017；wang等人，2018c），并支持从简单的构建块构建复杂的架构。注意，我们避免使用“图网络”标签中的“神经”一词来重新确认它们可以用神经网络以外的函数来实现，尽管这里我们的重点是神经网络的实现。在gn框架中，计算的主要单位是gn块，即graph to graph“modu以图形为输入，对结构执行计算，并返回图形作为输出的le。如框3所述，实体由图形的节点、边的关系以及全局属性表示的系统级属性表示。gn框架的块组织强调可定制性，并综合了表示所需关系归纳偏差的新架构。关键的设计原则是：灵活的表示（见第4.1节）；可在块结构中配置（见第4.2节）；以及可组合的多块结构（见第4.3节）。我们引入一个激励性的例子来帮助使gn形式主义更加具体。考虑预测一组橡胶球在任意重力场中的运动，这些橡胶球不是互相反弹，而是有一个或多个弹簧将它们连接到其他的一些（或全部）上。我们将在下面的定义中引用这个运行示例，以激励图形表示和在其上运行的计算。图2描述了一些其他常见的场景，这些场景可以用图形表示，并通过使用图形网络进行推理。

Here we use \graph" to mean a directed, attributed multi-graph with a global attribute. In our terminology, a node is denoted as vi, an edge as ek, and the global attributes as u. We also use sk and rk to indicate the indices of the sender and receiver nodes (see below), respectively, for edge k. To be more precise, we de ne these terms as: Directed : one-way edges, from a \sender" node to a \receiver" node. Attribute : properties that can be encoded as a vector, set, or even another graph. Attributed : edges and vertices have attributes associated with them. Global attribute : a graph-level attribute. Multi-graph : there can be more than one edge between vertices, including self-edges. Figure 2 shows a variety of di erent types of graphs corresponding to real data that we may be interested in modeling, including physical systems, molecules, images, and text.
这里我们使用“graph”来表示具有全局属性的有向属性多图。在我们的术语中，节点表示为vi，边缘表示为ek，全局属性表示为u。对于边缘k，我们还分别使用sk和rk表示发送方和接收方节点的索引（见下文）。更准确地说，我们将这些术语定义为：定向：单向边缘，来自发送方“否”。DE到一个“接收器”节点。属性：可以编码为矢量、集合甚至其他图形的属性。属性化：边和顶点具有与其关联的属性。全局属性：图形级属性。多图：顶点之间可以有多个边，包括自边。图2显示了与我们可能感兴趣的建模实际数据相对应的各种不同类型的图，包括物理系统、分子、图像和文本。

3.2.2 Internal structure of a GN block

3.2.3 Computational steps within a GN block

When a graph, G, is provided as input to a GN block, the computations proceed from the edge, to the node, to the global level. Figure 3 shows a depiction of which graph elements are involved in each of these computations, and Figure 4a shows a full GN block, with its update and aggregation functions. Algorithm 1 shows the following steps of computation:
当图形g作为gn块的输入提供时，计算将从边缘到节点再到全局级别。图3显示了每个计算中涉及哪些图形元素的描述，图4a显示了一个完整的gn块及其更新和聚合函数。算法1显示了以下计算步骤：

Figure 3: Updates in a GN block. Blue indicates the element that is being updated, and black indicates other elements which are involved in the update (note that the pre-update value of the blue element is also used in the update). See Equation 1 for details on the notation.

图3:gn块中的更新。蓝色表示正在更新的元素，黑色表示更新中涉及的其他元素（请注意，蓝色元素的预更新值也在更新中使用）。有关符号的详细信息，请参见方程式1。

Note, though we assume this sequence of steps here, the order is not strictly enforced: it is possible to reverse the update functions to proceed from global, to per-node, to per-edge updates, for example. Kearnes et al. (2016) computes edge updates from nodes in a similar manner.
注意，尽管我们假定这里的步骤是这样的，但顺序并没有严格执行：例如，可以反转更新函数，从全局更新到每个节点更新，再到每个边缘更新。Kearnes等人（2016）以类似方式计算节点的边缘更新。

3.2.4 Relational inductive biases in graph networks

Our GN framework imposes several strong relational inductive biases when used as components in a learning process. First, graphs can express arbitrary relationships among entities, which means the GN’s input determines how representations interact and are isolated, rather than those choices being determined by the xed architecture. For example, the assumption that two entities have a relationship|and thus should interact|is expressed by an edge between the entities’ corresponding nodes. Similarly, the absence of an edge expresses the assumption that the nodes have no relationship and should not inuence each other directly. Second, graphs represent entities and their relations as sets, which are invariant to permutations. This means GNs are invariant to the order of these elements6, which is often desirable. For example, the objects in a scene do not have a natural ordering (see Sec. 2.2). Third, a GN’s per-edge and per-node functions are reused across all edges and nodes, respectively. This means GNs automatically support a form of combinatorial generalization (see Section 5.1): because graphs are composed of edges, nodes, and global features, a single GN can operate on graphs of di erent sizes (numbers of edges and nodes) and shapes (edge connectivity).

我们的gn框架在学习过程中作为组件使用时，会产生一些强烈的关系归纳偏差。首先，图形可以表示实体之间的任意关系，这意味着gn的输入决定了表示是如何交互和隔离的，而不是由固定体系结构决定的那些选择。例如，假设两个实体有一个关系，因此应该相互作用，这一假设由实体对应节点之间的边表示。同样，缺少边表示节点之间没有关系，不应直接互相影响的假设。其次，图以集合的形式表示实体及其关系，这些集合对置换是不变的。这意味着GNS与这些元素的顺序是不变的6，这通常是可取的。例如，场景中的对象没有自然的顺序（请参见第2.2条）。第三，每个边和每个节点的gn函数分别在所有边和节点上重用。这意味着GNS自动支持一种组合泛化形式（见第5.1节）：因为图由边、节点和全局特征组成，单个GN可以在不同大小（边和节点数）和形状（边连通性）的图上操作。

4 Design principles for graph network architectures

The GN framework can be used to implement a wide variety of architectures, in accordance with the design principles listed above in Section 3.2, which also correspond to the sub-sections (4.1, 4.2, and 4.3) below. In general, the framework is agnostic to speci c attribute representations and functional forms. Here, however, we focus mainly on deep learning architectures, which allow GNs to act as learnable graph-to-graph function approximators.
根据上述第3.2节中列出的设计原则，gn框架可用于实现各种各样的架构，这些原则也对应于下面的子节（4.1、4.2和4.3）。一般来说，框架对于特定的属性表示和功能形式是不可知的。然而，在这里，我们主要关注深度学习体系结构，它允许GNS充当可学习的图-图函数逼近器。

4.1 Flexible representations

Graph networks support highly exible graph representations in two ways: rst, in terms of the representation of the attributes; and second, in terms of the structure of the graph itself.
图形网络以两种方式支持高度灵活的图形表示：第一，在属性表示方面；第二，在图形本身的结构方面。

4.1.1 Attributes

The global, node, and edge attributes of a GN block can use arbitrary representational formats. In deep learning implementations, real-valued vectors and tensors are most common. However, other data structures such as sequences, sets, or even graphs could also be used.

The requirements of the problem will often determine what representations should be used for the attributes. For example, when the input data is an image, the attributes might be represented as tensors of image patches; however, when the input data is a text document, the attributes might be sequences of words corresponding to sentences.

For each GN block within a broader architecture, the edge and node outputs typically correspond to lists of vectors or tensors, one per edge or node, and the global outputs correspond to a single vector or tensor. This allows a GN’s output to be passed to other deep learning building blocks such as MLPs, CNNs, and RNNs.
The output of a GN block can also be tailored to the demands of the task. In particular,

An edge-focused GN uses the edges as output, for example to make decisions about interactions among entities (Kipf et al., 2018; Hamrick et al., 2018).
A node-focused GN uses the nodes as output, for example to reason about physical systems (Battaglia et al., 2016; Chang et al., 2017; Wang et al., 2018b; Sanchez-Gonzalez et al., 2018).
A graph-focused GN uses the globals as output, for example to predict the potential energy of a physical system (Battaglia et al., 2016), the properties of a molecule (Gilmer et al., 2017), or answers to questions about a visual scene (Santoro et al., 2017).

The nodes, edges, and global outputs can also be mixed-and-matched depending on the task. For example, Hamrick et al. (2018) used both the output edge and global attributes to compute a policy over actions.

gn块的全局、节点和边缘属性可以使用任意表示格式。在深度学习实现中，实值向量和张量是最常见的。但是，也可以使用其他数据结构，如序列、集合，甚至图形。
问题的需求通常会决定属性应该使用什么表示。例如，当输入数据是图像时，属性可以表示为图像补丁的张量；但是，当输入数据是文本文档时，属性可以是与句子对应的单词序列。
对于更广泛体系结构中的每个gn块，边和节点输出通常对应于向量或张量列表，每个边或节点一个，全局输出对应于单个向量或张量。这允许将gn的输出传递给其他深入学习构建块，如mlps、cnns和rnns。
gn块的输出也可以根据任务的需求进行调整。特别地，
-以边缘为中心的gn使用边缘作为输出，例如，决定实体之间的交互（kipf等人，2018年；hamrick等人，2018年）。
-以节点为中心的gn使用节点作为输出，例如用于解释物理系统（Battaglia等人，2016；Chang等人，2017；Wang等人，2018b；Sanchez-Gonzalez等人，2018）。
-以图形为中心的gn使用全局作为输出，例如预测物理系统的势能（Battaglia等人，2016年）、分子的性质（Gilmer等人，2017年）或有关视觉场景的问题的答案（Santro等人，2017年）。
节点、边和全局输出也可以根据任务进行混合和匹配。例如，Hamrick等人（2018）使用输出边缘和全局属性计算策略对操作。

4.1.2 Graph structure

When de ning how the input data will be represented as a graph, there are generally two scenarios: rst, the input explicitly speci es the relational structure; and second, the relational structure must be inferred or assumed. These are not hard distinctions, but extremes along a continuum. Examples of data with more explicitly speci ed entities and relations include knowledge graphs, social networks, parse trees, optimization problems, chemical graphs, road networks, and physical systems with known interactions. Figures 2a-d illustrate how such data can be expressed as graphs. Examples of data where the relational structure is not made explicit, and must be inferred or assumed, include visual scenes, text corpora, programming language source code, and multi-agent systems. In these types of settings, the data may be formatted as a set of entities without relations, or even just a vector or tensor (e.g., an image). If the entities are not speci ed explicitly, they might be assumed, for instance, by treating each word in a sentence (Vaswani et al., 2017) or each local feature vector in a CNN’s output feature map, as a node (Watters et al., 2017; Santoro et al., 2017; Wang et al., 2018c) (Figures 2e-f). Or, it might be possible to use a separate learned mechanism to infer entities from an unstructured signal (Luong et al., 2015; Mnih et al., 2014; Eslami et al., 2016; van Steenkiste et al., 2018). If relations are not available, the simplest approach is to instantiate all possible directed edges between entities (Figure 2f). This can be prohibitive for large numbers of entities, however, because the number of possible edges grows quadratically with the number of nodes. Thus developing more sophisticated ways of inferring sparse structure from unstructured data (Kipf et al., 2018) is an important future direction.

当定义如何将输入数据表示为图形时，通常有两种情况：第一，输入显式地指定关系结构；第二，必须推断或假定关系结构。这些不是硬区别，而是一个连续统的极值。具有更明确指定实体和关系的数据示例包括知识图、社会网络、解析树、优化问题、化学图、道路网络和具有已知交互的物理系统。图2a-d说明了如何将这些数据表示为图形。关系结构不显式且必须推断或假定的数据示例包括可视场景、文本语料库、编程语言源代码和多代理系统。在这些类型的设置中，数据可以被格式化为一组没有关系的实体，甚至只是一个向量或张量（例如图像）。如果实体没有明确指定，可以假定，例如，将句子中的每个词（Vaswani等人，2017年）或CNN输出特征图中的每个局部特征向量视为节点（Watters等人，2017年；Santoro等人，2017年；Wang等人，2018c）（图2e-f）。或者，可以使用单独的学习机制从非结构化信号中推断实体（Luong等人，2015；Mnih等人，2014；Eslami等人，2016；van Steenkiste等人，2018）。如果关系不可用，最简单的方法是实例化实体之间所有可能的有向边（图2f）。但是，对于大量的实体来说，这是不允许的，因为可能的边的数量与节点的数量呈四次方增长。因此，开发更复杂的方法从非结构化数据推断稀疏结构（Kipf等人，2018年）是一个重要的未来方向。

4.2 Configurable within-block structure

A variety of other architectures can be expressed in the GN framework, often as di erent function choices and within-block con gurations. The remaining sub-sections explore how a GN’s within-block structure can be con gured in di erent ways, with examples of published work which uses such con gurations. See the Appendix for details.

在gn框架中可以表示出各种各样的其他架构，通常作为不同的功能选择和块配置。其余的小节将探讨如何以不同的方式配置块结构中的gn，并以使用此类配置的已发布工作为例。详见附件。

Figure 4c shows how an MPNN is structured, according to the GN framework. For details and various MPNN architectures, see the Appendix.

图4c显示了MPNN的结构，根据gn框架。有关详细信息和各种MPNN架构，请参阅附录。

4.2.2 Non-local neural networks (NLNN)

Wang et al. (2018c)’s NLNN, which uni es various \intra-/self-/vertex-/graph-attention" approaches (Lin et al., 2017; Vaswani et al., 2017; Hoshen, 2017; Velickovi et al., 2018; Shaw et al., 2018), can also be translated into the GN formalism. The label \attention" refers to how the nodes are updated: each node update is based on a weighted sum of (some function of) the node attributes of its neighbors, where the weight between a node and one of its neighbors is computed by a scalar pairwise function between their attributes (and then normalized across neighbors). The published NLNN formalism does not explicitly include edges, and instead computes pairwise attention weights between all nodes. But various NLNN-compliant models, such as the vertex attention interaction network (Hoshen, 2017) and graph attention network (Velickovi et al., 2018), are able to handle explicit edges by e ectively setting to zero the weights between nodes which do not share an edge.

Wang等人（2018c）“的NLNN，其统一了各种\内部/自我/顶点/图形-注意”方法（Lin等人，2017年；Vaswani等人，2017年；Hoshen，2017年；Velickovi等人，2018年；Shaw等人，2018年），也可以转化为GN形式主义。“label\attention”指的是如何更新节点：每个节点更新都基于其相邻节点的节点属性的加权和（某些函数），其中一个节点与其相邻节点之间的权重由其属性之间的标量成对函数计算（然后不是在邻居之间正常化）。已发布的NLNN形式主义不显式包含边，而是计算所有节点之间的成对注意权重。但是，各种NLNN兼容模型，例如顶点注意交互网络（hoshen，2017）和图形注意网络（velickovi等人，2018），能够通过将不共享边缘的节点之间的权重设置为零来处理显式边缘。

This formulation may be helpful for focusing only on those interactions which are most relevant for the downstream task, especially when the input entities were a set, from which a graph was formed by adding all possible edges between them.
这个公式可能有助于只关注那些与下游任务最相关的交互，特别是当输入实体是一个集合时，从中通过在它们之间添加所有可能的边来形成一个图。

4.2.3 Other graph network variants

See the Appendix for further details. Relation Networks (Raposo et al., 2017; Santoro et al., 2017) bypass the node update entirely and predict the global output from pooled edge information directly (see also Figure 4e),
详见附件。关系网络（Raposo等人，2017年；Santoro等人，2017年）完全绕过节点更新，直接预测汇集边缘信息的全球输出（另见图4e）。

Deep Sets (Zaheer et al., 2017) bypass the edges update completely and predict the global output from pooled nodes information directly (Figure 4f),
深度集（Zaheer等人，2017）完全绕过边缘更新，直接预测集合节点信息的全局输出（图4f）。

Figure 6: (a) An example composing multiple GN blocks in sequence to form a GN \core". Here, the GN blocks can use shared weights, or they could be independent. (b) The encode-process-decode architecture, which is a common choice for composing GN blocks (see Section 4.3). Here, a GN encodes an input graph, which is then processed by a GN core. The output of the core is decoded by a third GN block into an output graph, whose nodes, edges, and/or global attributes would be used for task-speci c purposes. © The encode-process-decode architecture applied in a sequential setting in which the core is also unrolled over time (potentially using a GRU or LSTM architecture), in addition to being repeated within each time step. Here, merged lines indicate concatenation, and split lines indicate copying.
图6：（a）按顺序组成多个gn块以形成gn \ core”的示例。在这里，gn块可以使用共享的权重，或者它们可以是独立的。（b）编码过程解码架构，这是组成gn块的常见选择（见第4.3节）。在这里，gn编码输入图，然后由gn核心处理。核心的输出被第三个gn块解码成输出图，其节点、边缘和/或全局属性将用于特定任务。（c）编码过程解码体系结构，应用于一个顺序设置中，在该设置中，除了在每个时间步骤中重复外，还随着时间推移展开核心（可能使用GRU或LSTM体系结构）。这里，合并行表示连接，拆分行表示复制。

4.3 Composable multi-block architectures

A key design principle of graph networks is constructing complex architectures by composing GN blocks. We de ned a GN block as always taking a graph comprised of edge, node, and global elements as input, and returning a graph with the same constituent elements as output (simply passing through the input elements to the output when those elements are not explicitly updated).
图形网络的一个关键设计原则是通过组成gn块来构造复杂的体系结构。我们设计了一个gn块，它总是以一个由边、节点和全局元素组成的图作为输入，并返回一个与输出具有相同组成元素的图（当这些元素没有显式更新时，只需将输入元素传递给输出）。

apply an elementary dynamics update, and the decoder might read out the nal positions from the updated graph state.

Figure 7: Example of message passing. Each row highlights the information that di uses through the graph starting from a particular node. In the top row, the node of interest is in the upper right; in the bottom row, the node of interest is in the bottom right. Shaded nodes indicate how far information from the original node can travel in m steps of message passing; bolded edges indicate which edges that information has the potential to travel across. Note that during the full message passing procedure, this propagation of information happens simultaneously for all nodes and edges in the graph (not just the two shown here).

图7：消息传递示例。每一行突出显示DI从特定节点开始通过图形使用的信息。在最上面一行中，感兴趣的节点位于右上方；在最下面一行中，感兴趣的节点位于右下方。阴影节点表示信息与原始节点之间的距离，以消息传递的m步为单位；加粗的边缘表示信息可能穿过的边缘。注意，在完整的消息传递过程中，对于图中的所有节点和边（不仅仅是图中所示的两个），信息的传播是同时发生的。

4.4 Implementing graph networks in code

Similar to CNNs (see Figure 1), which are naturally parallelizable (e.g. on GPUs), GNs have a natural parallel structure: since the e and v functions in Equation 1 are shared over the edges and nodes, respectively, they can be computed in parallel. In practice, this means that with respect
与自然可并行的CNN（见图1）类似（如GPU上的CNN），GNS具有自然的并行结构：由于方程1中的e和v函数分别在边和节点上共享，因此它们可以并行计算。实际上，这意味着

We have released an open-source software library for building GNs, which can be found here:github.com/deepmind/graph nets. See Box 4 for an overview.

我们已经发布了一个用于构建GNS的开源软件库，可以在这里找到：github.com/deepmind/graph nets。参见框4了解概述。

Box 4: Graph Nets open-source software library: github.com/deepmind/graph nets

We have released an open-source library for building GNs in Tensorow/Sonnet. It includes demos of how to create, manipulate, and train GNs to reason about graph-structured data, on a shortest path- nding task, a sorting task, and a physical prediction task. Each demo uses the same GN architecture, which highlights the exibility of the approach.
我们已经发布了一个开放源代码库，用于在Tensorow/Sonnet中构建GNS。它包括如何创建、操作和培训GNS以了解有关图形结构化数据、最短路径任务、排序任务和物理预测任务的原因的演示。每个演示使用相同的gn体系结构，这突出了该方法的可行性。

Shortest path demo: tinyurl.com/gn-shortest-path-demo

This demo creates random graphs, and trains a GN to label the nodes and edges on the shortest path between any two nodes. Over a sequence of message-passing steps (as depicted by each step’s plot), the model re nes its prediction of the shortest path.
这个演示创建随机图，并训练一个gn来标记任意两个节点之间最短路径上的节点和边。通过一系列消息传递步骤（如每个步骤的图所示），模型重新预测最短路径。

Sort demo: tinyurl.com/gn-sort-demo

This demo creates lists of random numbers, and trains a GN to sort the list. After a sequence of message-passing steps, the model makes an accurate prediction of which elements (columns in the gure) come next after each other (rows).
此演示创建随机数字列表，并训练gn对列表进行排序。在一系列消息传递步骤之后，该模型精确地预测了哪些元素（图中的列）紧随其后（行）。

Physics demo: tinyurl.com/gn-physics-demo

This demo creates random mass-spring physical systems, and trains a GN to predict the state of the system on the next timestep. The model’s next-step predictions can be fed back in as input to create a rollout of a future trajectory. Each subplot below shows the true and predicted mass-spring system states over 50 timesteps. This is similar to the model and experiments in (Battaglia et al., 2016)’s \interaction networks".
这个演示创建了随机的质量-弹簧物理系统，并训练一个gn在下一个时间步预测系统的状态。模型的下一步预测可以作为输入反馈，以创建未来轨迹的首次展示。下面的每个子块显示了超过50个时间步的真实和预测的质量弹簧系统状态。这类似于（Battaglia等人，2016）“交互网络”中的模型和实验。

4.5 Summary

In this section, we have discussed the design principles behind graph networks: exible representa-tions, con gurable within-block structure, and composable multi-block architectures. These three design principles combine in our framework which is extremely exible and applicable to a wide range of domains ranging from perception, language, and symbolic reasoning. And, as we will see in the remainder of this paper, the strong relational inductive bias possessed by graph networks supports combinatorial generalization, thus making it a powerful tool both in terms of implementation and theory.
在这一部分中，我们讨论了图形网络背后的设计原则：灵活的表示、块内可配置的结构以及可组合的多块结构。这三个设计原则结合在我们的框架中，这个框架非常灵活，适用于范围广泛的领域，从感知、语言和符号推理。并且，正如我们在本文的剩余部分所看到的，图网络所具有的强大的关系归纳偏差支持组合泛化，从而使其成为实现和理论上的强大工具。

5 Discussion

In this paper, we analyzed the extent to which relational inductive bias exists in deep learning architectures like MLPs, CNNs, and RNNs, and concluded that while CNNs and RNNs do contain relational inductive biases, they cannot naturally handle more structured representations such as sets or graphs. We advocated for building stronger relational inductive biases into deep learning architectures by highlighting an underused deep learning building block called a graph network, which performs computations over graph-structured data. Our graph network framework uni es existing approaches that also operate over graphs, and provides a straightforward interface for assembling graph networks into complex, sophisticated architectures.
本文分析了关系归纳偏倚在MLP、CNN和RNN等深度学习体系结构中的存在程度，认为当CNN和RNN确实包含关系归纳偏倚时，它们自然不能处理集或图等更结构化的表示。我们主张通过突出一个未被充分利用的深度学习构建块（称为图形网络），将更强的关系归纳偏差构建到深度学习体系结构中，该构建块通过图形结构化数据执行计算。我们的图形网络框架统一了对图形进行操作的现有方法，并为将图形网络组装成复杂、复杂的体系结构提供了一个简单的接口。

5.1 Combinatorial generalization in graph networks

The structure of GNs naturally supports combinatorial generalization because they do not perform computations strictly at the system level, but also apply shared computations across the entities and across the relations as well. This allows never-before-seen systems to be reasoned about, because they are built from familiar components, in a way that reects von Humboldt’s \in nite use of nite means" (Humboldt, 1836; Chomsky, 1965). A number of studies have explored GNs’ capacity for combinatorial generalization. Battaglia et al. (2016) found that GNs trained to make one-step physical state predictions could simulate thousands of future time steps, and also exhibit accurate zero-shot transfer to physical systems with double, or half, the number of entities experienced during training. Sanchez-Gonzalez et al. (2018) found similar results in more complex physical control settings, including that GNs trained as forward models on simulated multi-joint agents could generalize to agents with new numbers of joints. Hamrick et al. (2018) and Wang et al. (2018b) each found that GN-based decision-making policies could transfer to novel numbers of entities as well. In combinatorial optimization problems, Bello et al. (2016); Nowak et al. (2017); Dai et al. (2017); Kool and Welling (2018) showed that GNs could generalize well to problems of much di erent sizes than they had been trained on. Similarly, Toyer et al. (2017) showed generalization to di erent sizes of planning problems, and Hamilton et al. (2017) showed generalization to producing useful node embeddings for previously unseen data. On boolean SAT problems, Selsam et al. (2018) demonstrated generalization both to di erent problem sizes and across problem distributions: their model retained good performance upon strongly modifying the distribution of the input graphs and its typical local structure. These striking examples of combinatorial generalization are not entirely surprising, given GNs’ entity- and relation-centric organization, but nonetheless provide important support for the view that embracing explicit structure and exible learning is a viable approach toward realizing better sample e ciency and generalization in modern AI.
GNS的结构自然支持组合泛化，因为它们不严格地在系统级别执行计算，而且还跨实体和跨关系应用共享计算。这使得以前从未见过的系统得以合理化，因为它们是由熟悉的组件构建的，以一种方式反映了冯·洪堡在NIT中使用NIT手段的方式（洪堡，1836年；乔姆斯基，1965年）。许多研究已经探索了GNS的组合泛化能力。Battaglia等人（2016）发现，接受过一步物理状态预测训练的全球导航卫星系统可以模拟未来数千个时间步，并且能够精确地向物理系统进行零炮转移，训练期间经历的实体数量是实际系统的两倍或一半。Sanchez Gonzalez等人（2018）在更复杂的物理控制设置中发现了类似的结果，包括在模拟多关节代理上训练为正向模型的GNS可以推广到具有新关节数量的代理。Hamrick等人（2018）和Wang等人（2018b）每个人都发现，基于gn的决策政策也可以转移到新数量的实体。在组合优化问题中，Bello等人（2016）；Nowak等人（2017年）；Dai等人（2017年）；Kool和Welling（2018年）表明，全球导航卫星系统可以很好地概括出比他们所接受的培训规模大得多的问题。同样，Toyer等人（2017）对不同规模的规划问题进行了概括，Hamilton等人（2017年）对为以前未发现的数据生成有用的节点嵌入进行了概括。关于布尔SAT问题，Selsam等人（2018）证明了对不同问题大小和跨问题分布的概括：在强烈修改输入图及其典型局部结构的分布后，其模型保持了良好的性能。考虑到GNS以实体和关系为中心的组织，这些组合泛化的显著例子并不完全令人惊讶，但仍然为以下观点提供了重要支持：采用显式结构和灵活学习是实现更好的SAMP的可行方法。现代人工智能的科学性和泛化。

5.2 Limitations of graph networks

One limitation of GNs’ and MPNNs’ form of learned message-passing (Shervashidze et al., 2011) is that it cannot be guaranteed to solve some classes of problems, such as discriminating between certain non-isomorphic graphs. Kondor et al. (2018) suggested that covariance7 (Cohen and Welling, 2016; Kondor and Trivedi, 2018), rather than invariance to permutations of the nodes and edges is preferable, and proposed \covariant compositional networks" which can preserve structural information, and allow it to be ignored only if desired. More generally, while graphs are a powerful way of representing structure information, they have limits. For example, notions like recursion, control ow, and conditional iteration are not straightforward to represent with graphs, and, minimally, require additional assumptions (e.g., in interpreting abstract syntax trees). Programs and more \computer-like" processing can o er greater representational and computational expressivity with respect to these notions, and some have argued they are an important component of human cognition (Tenenbaum et al., 2011; Lake et al., 2015; Goodman et al., 2015).
GNS和MPNNS的学习消息传递形式的一个限制（Shervashidze等人，2011）是不能保证解决某些类型的问题，例如区分某些非同构图。Kondor等人（2018）建议协方差7（Cohen and Welling，2016；Kondor and Trivedi，2018），而不是对节点和边排列的不变性更为可取，并建议“协变组成网络”，它可以保留结构信息，并且仅在需要时才允许忽略。更一般地说，虽然图是表示结构信息的一种强大方法，但它们有局限性。例如，递归、控制OW和条件迭代等概念不容易用图形表示，并且至少需要附加假设（例如，在解释抽象语法树时）。程序和更多类似于计算机的“处理可以对这些概念有更大的代表性和计算表现力，一些人认为它们是人类认知的重要组成部分（Tenenbaum等人，2011年；Lake等人，2015年；Goodman等人，2015年）。

5.3 Open questions

Although we are excited about the potential impacts that graph networks can have, we caution that these models are only one step forward. Realizing the full potential of graph networks will likely be far more challenging than organizing their behavior under one framework, and indeed, there are a number of unanswered questions regarding the best ways to use graph networks. One pressing question is: where do the graphs come from that graph networks operate over? One of the hallmarks of deep learning has been its ability to perform complex computations over raw sensory data, such as images and text, yet it is unclear the best ways to convert sensory data into more structured representations like graphs. One approach (which we have already discussed) assumes a fully connected graph structure between spatial or linguistic entities, such as in the literature on self-attention (Vaswani et al., 2017; Wang et al., 2018c). However, such representations may not correspond exactly to the \true" entities (e.g., convolutional features do not directly correspond to objects in a scene). Moreover, many underlying graph structures are much more sparse than a fully connected graph, and it is an open question how to induce this sparsity. Several lines of active research are exploring these issues (Watters et al., 2017; van Steenkiste et al., 2018; Li et al., 2018; Kipf et al., 2018) but as of yet there is no single method which can reliably extract discrete entities from sensory data. Developing such a method is an exciting challenge for future research, and once solved will likely open the door for much more powerful and exible reasoning algorithms. A related question is how to adaptively modify graph structures during the course of computation. For example, if an object fractures into multiple pieces, a node representing that object also ought to split into multiple nodes. Similarly, it might be useful to only represent edges between objects that are in contact, thus requiring the ability to add or remove edges depending on context. The question of how to support this type of adaptivity is also actively being researched, and in particular, some of the methods used for identifying the underlying structure of a graph may be applicable (e.g. Li et al., 2018; Kipf et al., 2018). Human cognition makes the strong assumption that the world is composed of objects and relations (Spelke and Kinzler, 2007), and because GNs make a similar assumption, their behavior tends to be more interpretable. The entities and relations that GNs operate over often correspond to things that humans understand (such as physical objects), thus supporting more interpretable analysis and visualization (e.g., as in Selsam et al., 2018). An interesting direction for future work is to further explore the interpretability of the behavior of graph networks.
尽管我们对图形网络可能产生的潜在影响感到兴奋，但我们注意到，这些模型只是向前迈出了一步。认识到图形网络的全部潜力可能比在一个框架下组织它们的行为更具挑战性，而且确实，关于使用图形网络的最佳方式，还有许多未解决的问题。一个紧迫的问题是：这些图从何而来？这些图网络的运行结束了？深度学习的一个特点是它能够对原始感官数据（如图像和文本）进行复杂的计算，但目前还不清楚将感官数据转换为更结构化的表示形式（如图形）的最佳方法。一种方法（我们已经讨论过）假设空间或语言实体之间存在完全连接的图结构，例如在有关自我注意的文献中（Vaswani等人，2017；Wang等人，2018c）。但是，这种表示可能并不完全对应于“真”实体（例如，卷积特征不直接对应于场景中的对象）。而且，许多底层的图结构比完全连通的图更稀疏，如何引起这种稀疏性是一个悬而未决的问题。一些积极的研究正在探索这些问题（Watters等人，2017年；van Steenkiste等人，2018年；Li等人，2018年；Kipf等人，2018年），但迄今为止还没有一种方法能够可靠地从感官数据中提取离散实体。开发这种方法是未来研究的一个令人兴奋的挑战，一旦解决，将可能为更强大和更灵活的推理算法打开大门。一个相关的问题是如何在计算过程中自适应地修改图形结构。例如，如果一个对象断裂成多个部分，表示该对象的节点也应该拆分成多个节点。同样，只表示接触对象之间的边可能很有用，因此需要根据上下文添加或删除边的能力。如何支持这种适应性的问题也在积极研究中，特别是用于识别图的底层结构的一些方法可能适用（例如，Li等人，2018；Kipf等人，2018）。人类认知作出了一个强有力的假设，即世界是由物体和关系组成的（Spelke和Kinzler，2007年），由于全球导航卫星系统作出了类似的假设，他们的行为往往更易于解释。GNS运作的实体和关系通常与人类理解的事物（如物理对象）相对应，从而支持更可解释的分析和可视化（如Selsam等人，2018）。进一步探讨图网络行为的可解释性是今后工作的一个有趣方向。

5.4 Integrative approaches for learning and structure

While our focus here has been on graphs, one takeaway from this paper is less about graphs themselves and more about the approach of blending powerful deep learning approaches with structured representations. We are excited by related approaches which have explored this idea for other types of structured representations and computations, such as linguistic trees (Socher et al., 2011a,b, 2012, 2013; Tai et al., 2015; Andreas et al., 2016), partial tree traversals in a state-action graph (Guez et al., 2018; Farquhar et al., 2018), hierarchical action policies (Andreas et al., 2017), multi-agent communication channels (Foerster et al., 2016), \capsules" (Sabour et al., 2017), and programs (Parisotto et al., 2017). Other methods have attempted to capture di erent types of structure by mimicking key hardware and software components in computers and how they transfer information between each other, such as persistent slotted storage, registers, memory I/O controllers, stacks, and queues (e.g. Dyer et al., 2015; Grefenstette et al., 2015; Joulin and Mikolov, 2015; Sukhbaatar et al., 2015; Kurach et al., 2016; Graves et al., 2016).
虽然我们在这里关注的是图形，但本文的一个要点并不是关于图形本身，而是关于将强大的深度学习方法与结构化表示相结合的方法。我们对相关的方法感到兴奋，这些方法已经探索了其他类型的结构化表示和计算的这一想法，例如语言树（Socher等人，2011a，b，2012，2013；Tai等人，2015；Andreas等人，2016），状态动作图中的部分树遍历（Guez等人，2018；Farquhar等人，2018年）、分层行动政策（Andreas等人，2017年）、多代理沟通渠道（Foerster等人，2016年）、胶囊（Sabour等人，2017年）和项目（Parisotto等人，2017年）。其他方法试图通过模拟计算机中的关键硬件和软件组件以及它们如何在彼此之间传输信息来捕获不同类型的结构，例如持久时隙存储、寄存器、内存I/O控制器、堆栈和队列（例如Dyer等人，2015年；Grefenstette等人，2015年；Joulin和Mikolov，2015年；Sukhbaatar等人，2015年；Kurach等人，2016年；Graves等人，2016年）。

5.5 Conclusion

Recent advances in AI, propelled by deep learning, have been transformative across many important domains. Despite this, a vast gap between human and machine intelligence remains, especially with respect to e cient, generalizable learning. We argue for making combinatorial generalization a top priority for AI, and advocate for embracing integrative approaches which draw on ideas from human cognition, traditional computer science, standard engineering practice, and modern deep learning. Here we explored exible learning-based approaches which implement strong relational inductive biases to capitalize on explicitly structured representations and computations, and presented a framework called graph networks, which generalize and extend various recent approaches for neural networks applied to graphs. Graph networks are designed to promote building complex architectures using customizable graph-to-graph building blocks, and their relational inductive biases promote combinatorial generalization and improved sample e ciency over other standard machine learning building blocks. Despite their bene ts and potential, however, learnable models which operate on graphs are only a stepping stone on the path toward human-like intelligence. We are optimistic about a number of other relevant, and perhaps underappreciated, research directions, including marrying learning-based approaches with programs (Ritchie et al., 2016; Andreas et al., 2016; Gaunt et al., 2016; Evans and Grefenstette, 2018; Evans et al., 2018), developing model-based approaches with an emphasis on abstraction (Kansky et al., 2017; Konidaris et al., 2018; Zhang et al., 2018; Hay et al., 2018), investing more heavily in meta-learning (Wang et al., 2016, 2018a; Finn et al., 2017), and exploring multi-agent learning and interaction as a key catalyst for advanced intelligence (Nowak, 2006; Ohtsuki et al., 2006). These directions each involve rich notions of entities, relations, and combinatorial generalization, and can potentially bene t, and bene t from, greater interaction with approaches for learning relational reasoning over explicitly structured representations.
在深度学习的推动下，人工智能的最新进展已经在许多重要领域发生了变革。尽管如此，人类和机器智能之间仍然存在巨大的差距，特别是在精通的、可概括的学习方面。我们主张将组合泛化作为人工智能的首要任务，提倡采用综合方法，这些方法借鉴人类认知、传统计算机科学、标准工程实践和现代深度学习的思想。在这里，我们探索了基于学习的方法来实现强大的关系归纳偏差，以利用显式结构表示和计算，并提出了一个框架，称为图网络，它概括和扩展了神经网络的各种最新方法。叠加成图形。图网络的设计是为了使用可定制的图来绘制构建块来促进构建复杂的体系结构，它们的关系归纳偏差促进了组合泛化，并提高了与其他标准机器学习构建块相比的样本效率。然而，尽管它们的好处和潜力，可学习的模型在图形上的操作只是人类智能道路上的一块踏脚石。我们对许多其他相关的、可能未被重视的研究方向持乐观态度，包括将基于学习的方法与项目结合（Ritchie等人，2016年；Andreas等人，2016年；Gaunt等人，2016年；Evans和Grefenstette，2018年；Evans等人，2018年），开发模型库。d注重抽象的方法（Kansky等人，2017年；Konidaris等人，2018年；Zhang等人，2018年；Hay等人，2018年），在元学习上投入更多（Wang等人，2016年、2018a；Finn等人，2017年），并探索多代理学习和交互作为高级的关键催化剂。情报（Nowak，2006；Ohtsuki等人，2006）。这些方向都涉及到实体、关系和组合泛化的丰富概念，并且可能有助于T，也有助于T与在显式结构表示上学习关系推理的方法进行更大的交互。

Acknowledgements

We thank Tobias Pfa , Danilo Rezende, Nando de Freitas, Murray Shanahan, Thore Graepel, John Jumper, Demis Hassabis, and the broader DeepMind and Google communities for valuable feedback and support.
感谢Tobias PFA、Danilo Rezende、Nando de Freitas、Murray Shanahan、Thore Graepel、John Jumper、Demis Hassabis以及更广泛的DeepMind和Google社区提供宝贵的反馈和支持。

References

Allamanis, M., Brockschmidt, M., and Khademi, M. (2018). Learning to represent programs with graphs. In Proceedings of the International Conference on Learning Representations (ICLR).

Allamanis, M., Chanthirasegaran, P., Kohli, P., and Sutton, C. (2017). Learning continuous semantic representations of symbolic expressions. In Proceedings of the International Conference on Machine Learning (ICML).

Anderson, J. R. (1982). Acquisition of cognitive skill. Psychological Review, 89(4):369.

Andreas, J., Klein, D., and Levine, S. (2017). Modular multitask reinforcement learning with policy sketches. In Proceedings of the International Conference on Machine Learning (ICML).

Andreas, J., Rohrbach, M., Darrell, T., and Klein, D. (2016). Neural module networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 39{48.

Ba, J. L., Kiros, J. R., and Hinton, G. E. (2016). Layer normalization. arXiv preprint arXiv:1607.06450.

Bahdanau, D., Cho, K., and Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In Proceedings of the International Conference on Learning Representations (ICLR).

Battaglia, P., Pascanu, R., Lai, M., Rezende, D. J., et al. (2016). Interaction networks for learning about objects, relations and physics. In Advances in Neural Information Processing Systems, pages 4502{4510.

Battaglia, P. W., Hamrick, J. B., and Tenenbaum, J. B. (2013). Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences, 110(45):18327{18332.

Bello, I., Pham, H., Le, Q. V., Norouzi, M., and Bengio, S. (2016). Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940.

Bobrow, D. G. and Hinton, G. E., editors (1990). Arti cial Intelligence, volume 46. Elsevier Science Publishers Ltd., Essex, UK. Special Issue 1-2: On Connectionist Symbol Processing.

Bojchevski, A., Shchur, O., Zugner, D., and Gunnemann, S. (2018). Netgan: Generating graphs via random walks. arXiv preprint arXiv:1803.00816.

Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., and Yakhnenko, O. (2013). Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems, pages 2787{2795.

Botvinick, M. M. (2008). Hierarchical models of behavior and prefrontal function. Trends in Cognitive Sciences, 12(5):201{208.

Bronstein, M. M., Bruna, J., LeCun, Y., Szlam, A., and Vandergheynst, P. (2017). Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine, 34(4):18{42.

Bruna, J., Zaremba, W., Szlam, A., and LeCun, Y. (2014). Spectral networks and locally connected networks on graphs. In Proceedings of the International Conference on Learning Representations (ICLR).

Chang, M. B., Ullman, T., Torralba, A., and Tenenbaum, J. B. (2017). A compositional object-based approach to learning physical dynamics. In Proceedings of the International Conference on Learning Representations (ICLR).

Chen, X., Li, L., Fei-Fei, L., and Gupta, A. (2018a). Iterative visual reasoning beyond convolutions.

In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Chen, X., Liu, C., and Song, D. (2018b). Tree-to-tree neural networks for program translation. In Workshops of the International Conference on Learning Representations (ICLR).

Cho, K., Van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. (2014). Learning phrase representations using rnn encoder-decoder for statistical machine translation. In Proceeding of the Conference on Empirical Methods in Natural Language Processing (EMNLP).

Chomsky, N. (1957). Syntactic Structures. Mouton & Co.

Chomsky, N. (1965). Aspects of the Theory of Syntax. MIT Press.

Cohen, T. and Welling, M. (2016). Group equivariant convolutional networks. In International Conference on Machine Learning, pages 2990{2999.

Craik, K. J. W. (1943). The Nature of Explanation. Cambridge University Press.

Cui, Z., Henrickson, K., Ke, R., and Wang, Y. (2018). High-order graph convolutional recurrent neural network: A deep learning framework for network-scale tra c learning and forecasting. arXiv preprint arXiv:1802.07007.

Dai, H., Dai, B., and Song, L. (2016). Discriminative embeddings of latent variable models for structured data. In Proceedings of the International Conference on Machine Learning (ICML).

Dai, H., Khalil, E. B., Zhang, Y., Dilkina, B., and Song, L. (2017). Learning combinatorial optimization algorithms over graphs. In Advances in Neural Information Processing Systems.

De Cao, N. and Kipf, T. (2018). MolGAN: An implicit generative model for small molecular graphs.

arXiv preprint arXiv:1805.11973.

De errard, M., Bresson, X., and Vandergheynst, P. (2016). Convolutional neural networks on graphs with fast localized spectral ltering. In Advances in Neural Information Processing Systems, pages 3844{3852.

Denil, M., Colmenarejo, S. G., Cabi, S., Saxton, D., and de Freitas, N. (2017). Programmable agents. arXiv preprint arXiv:1706.06383.

Devlin, J., Uesato, J., Singh, R., and Kohli, P. (2017). Semantic code repair using neuro-symbolic transformation networks. arXiv preprint arXiv:1710.11054.

Duvenaud, D. K., Maclaurin, D., Iparraguirre, J., Bombarell, R., Hirzel, T., Aspuru-Guzik, A., and Adams, R. P. (2015). Convolutional networks on graphs for learning molecular ngerprints. In Advances in Neural Information Processing Systems, pages 2224{2232.

Dyer, C., Ballesteros, M., Ling, W., Matthews, A., and Smith, N. A. (2015). Transition-based dependency parsing with stack long short-term memory. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).

Dzeroski, S., De Raedt, L., and Driessens, K. (2001). Relational reinforcement learning. Machine Learning, 43(1-2):7{52.

Edwards, H. and Storkey, A. (2016). Towards a neural statistician. arXiv preprint arXiv:1606.02185.

Eliasmith, C. (2013). How to build a brain: A neural architecture for biological cognition. Oxford University Press.

Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2):179{211.

Elman, J. L. (1991). Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning, 7(2-3):195{225.

Eslami, S. A., Heess, N., Weber, T., Tassa, Y., Szepesvari, D., Hinton, G. E., et al. (2016). Attend, infer, repeat: Fast scene understanding with generative models. In Advances in Neural Information Processing Systems, pages 3225{3233.

Evans, R. and Grefenstette, E. (2018). Learning explanatory rules from noisy data. Journal of Arti cial Intelligence Research, 61:1{64.

Evans, R., Saxton, D., Amos, D., Kohli, P., and Grefenstette, E. (2018). Can neural networks understand logical entailment? In Proceedings of the International Conference on Learning Representations (ICLR).

Farquhar, G., Rocktaschel, T., Igl, M., and Whiteson, S. (2018). TreeQN and ATreeC: Di erentiable tree planning for deep reinforcement learning. In Proceedings of the International Conference on Learning Representations (ICLR).

Finn, C., Abbeel, P., and Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. arXiv preprint arXiv:1703.03400.

Fodor, J. A. (1975). The Language of Thought. Harvard University Press.

Fodor, J. A. and Pylyshyn, Z. W. (1988). Connectionism and cognitive architecture: A critical analysis. Cognition, 28(1-2):3{71.

Foerster, J., Assael, I. A., de Freitas, N., and Whiteson, S. (2016). Learning to communicate with deep multi-agent reinforcement learning. In Advances in Neural Information Processing Systems, pages 2137{2145.

Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition una ected by shift in position. Biological Cybernetics, 36:193{202.

Garcia, V. and Bruna, J. (2018). Few-shot learning with graph neural networks. In Proceedings of the International Conference on Learning Representations (ICLR).

Garc a-Duran, A. and Niepert, M. (2017). Learning graph representations with embedding propaga-tion. arXiv preprint arXiv:1710.03059.

Garnelo, M., Arulkumaran, K., and Shanahan, M. (2016). Towards deep symbolic reinforcement learning. arXiv preprint arXiv:1609.05518.

Gaunt, A. L., Brockschmidt, M., Kushman, N., and Tarlow, D. (2016). Di erentiable programs with neural libraries. arXiv preprint arXiv:1611.02109.

Geman, S., Bienenstock, E., and Doursat, R. (1992). Neural networks and the bias/variance dilemma.

Neural Computation, 4(1):1{58.

Gentner, D. and Markman, A. B. (1997). Structure mapping in analogy and similarity. American Psychologist, 52(1):45.

Getoor, L. and Taskar, B. (2007). Introduction to Statistical Relational Learning. MIT press.

Ghahramani, Z. (2015). Probabilistic machine learning and arti cial intelligence. Nature, 521(7553):452.

Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E. (2017). Neural message passing for quantum chemistry. arXiv preprint arXiv:1704.01212.

Goodfellow, I., Bengio, Y., Courville, A., and Bengio, Y. (2016). Deep Learning. MIT Press.

Goodman, N. (1955). The new riddle of induction. In Fact, Fiction, and Forecast, pages 59{83.

Harvard University Press.

Goodman, N., Mansinghka, V., Roy, D. M., Bonawitz, K., and Tenenbaum, J. B. (2012). Church: a language for generative models. arXiv preprint arXiv:1206.3255.

Goodman, N. D., Tenenbaum, J. B., and Gerstenberg, T. (2015). Concepts in a probabilistic language of thought. In Margolis, E. and Laurence, S., editors, The Conceptual Mind: New Directions in the Study of Concepts. MIT Press.

Goodwin, G. P. and Johnson-Laird, P. (2005). Reasoning about relations. Psychological Review, 112(2):468.

Gori, M., Monfardini, G., and Scarselli, F. (2005). A new model for learning in graph domains. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), volume 2, pages 729{734. IEEE.

Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwinska, A., Colmenarejo, S. G., Grefenstette, E., Ramalho, T., Agapiou, J., et al. (2016). Hybrid computing using a neural network with dynamic external memory. Nature, 538(7626):471.

Grefenstette, E., Hermann, K. M., Suleyman, M., and Blunsom, P. (2015). Learning to transduce with unbounded memory. In Advances in Neural Information Processing Systems, pages 1828{1836.

Gri ths, T. L., Chater, N., Kemp, C., Perfors, A., and Tenenbaum, J. B. (2010). Probabilistic models of cognition: Exploring representations and inductive biases. Trends in Cognitive Sciences, 14(8):357{364.

Grover, A. and Leskovec, J. (2016). node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pages 855{864. ACM.

Guez, A., Weber, T., Antonoglou, I., Simonyan, K., Vinyals, O., Wierstra, D., Munos, R., and Silver, D. (2018). Learning to search with MCTSnets. arXiv preprint arXiv:1802.04697.

Gulcehre, C., Denil, M., Malinowski, M., Razavi, A., Pascanu, R., Hermann, K. M., Battaglia, P., Bapst, V., Raposo, D., Santoro, A., and de Freitas, N. (2018). Hyperbolic attention networks. arXiv preprint arXiv:1805.09786.

Hamaguchi, T., Oiwa, H., Shimbo, M., and Matsumoto, Y. (2017). Knowledge transfer for out-of-knowledge-base entities: A graph neural network approach. In Proceedings of the International Joint Conference on Arti cial Intelligence (IJCAI).

Hamilton, W., Ying, Z., and Leskovec, J. (2017). Inductive representation learning on large graphs.

In Advances in Neural Information Processing Systems, pages 1025{1035.

Hamrick, J., Allen, K., Bapst, V., Zhu, T., McKee, K., Tenenbaum, J., and Battaglia, P. (2018). Relational inductive bias for physical construction in humans and machines. In Proceedings of the 40th Annual Conference of the Cognitive Science Society.

Hamrick, J. B., Ballard, A. J., Pascanu, R., Vinyals, O., Heess, N., and Battaglia, P. W. (2017). Metacontrol for adaptive imagination-based optimization. In Proceedings of the International Conference on Learning Representations (ICLR).

Hartford, J., Graham, D. R., Leyton-Brown, K., and Ravanbakhsh, S. (2018). Deep models of interactions across sets. arXiv preprint arXiv:1803.02879.

Hay, N., Stark, M., Schlegel, A., Wendelken, C., Park, D., Purdy, E., Silver, T., Phoenix, D. S., and George, D. (2018). Behavior is everything{towards representing concepts with sensorimotor contingencies. In Proceedings of the AAAI Conference on Arti cial Intelligence (AAAI).

Hena , M., Bruna, J., and LeCun, Y. (2015). Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163.

Hinton, G. E. (1990). Mapping part-whole hierarchies into connectionist networks. Arti cial Intelligence, 46(1-2):47{75.

Hjort, N. L., Holmes, C., Muller, P., and Walker, S. G. (2010). Bayesian Nonparametrics. Cambridge University Press.

Hoshen, Y. (2017). Vain: Attentional multi-agent predictive modeling. In Advances in Neural Information Processing Systems, pages 2698{2708.

Hu, H., Gu, J., Zhang, Z., Dai, J., and Wei, Y. (2017). Relation networks for object detection.

arXiv preprint arXiv:1711.11575.

Hudson, D. A. and Manning, C. D. (2018). Compositional attention networks for machine reasoning.
In Proceedings of the International Conference on Learning Representations (ICLR).

Humboldt, W. (1999/1836). On Language: On the diversity of human language construction and its inuence on the mental development of the human species. Cambridge University Press.

Hummel, J. E. and Holyoak, K. J. (2003). A symbolic-connectionist theory of relational inference and generalization. Psychological Review, 110(2):220.

Io e, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning (ICML).

Johnson, D. D. (2017). Learning graphical state transitions. Proceedings of the International Conference on Learning Representations (ICLR).

Joulin, A. and Mikolov, T. (2015). Inferring algorithmic patterns with stack-augmented recurrent nets. In Advances in Neural Information Processing Systems, pages 190{198.

Kansky, K., Silver, T., Mely, D. A., Eldawy, M., Lazaro-Gredilla, M., Lou, X., Dorfman, N., Sidor, S., Phoenix, S., and George, D. (2017). Schema networks: Zero-shot transfer with a generative causal model of intuitive physics. In Proceedings of the International Conference on Machine Learning (ICML).

Kearnes, S., McCloskey, K., Berndl, M., Pande, V., and Riley, P. (2016). Molecular graph convolutions: moving beyond ngerprints. Journal of computer-aided molecular design, 30(8):595{ 608.

Kemp, C. and Tenenbaum, J. B. (2008). The discovery of structural form. Proceedings of the National Academy of Sciences, 105(31):10687{10692.

Kipf, T., Fetaya, E., Wang, K.-C., Welling, M., and Zemel, R. (2018). Neural relational inference for interacting systems. In Proceedings of the International Conference on Machine Learning (ICML).

Kipf, T. N. and Welling, M. (2017). Semi-supervised classi cation with graph convolutional networks.
In Proceedings of the International Conference on Learning Representations (ICLR).

Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques.

MIT press.

Kondor, R., Son, H. T., Pan, H., Anderson, B., and Trivedi, S. (2018). Covariant compositional networks for learning graphs. arXiv preprint arXiv:1801.02144.

Kondor, R. and Trivedi, S. (2018). On the generalization of equivariance and convolution in neural networks to the action of compact groups. arXiv preprint arXiv:1802.03690.

Konidaris, G., Kaelbling, L. P., and Lozano-Perez, T. (2018). From skills to symbols: Learning symbolic representations for abstract high-level planning. Journal of Arti cial Intelligence Research, 61:215{289.

Kool, W. and Welling, M. (2018). Attention solves your TSP. arXiv preprint arXiv:1803.08475.

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). ImageNet classi cation with deep convolu-tional neural networks. In Advances in Neural Information Processing Systems, pages 1097{1105.

Kurach, K., Andrychowicz, M., and Sutskever, I. (2016). Neural random-access machines. In Proceedings of the International Conference on Learning Representations (ICLR).

Lake, B. M. and Baroni, M. (2018). Still not systematic after all these years: On the compositional skills of sequence-to-sequence recurrent networks. In Proceedings of the International Conference on Machine Learning (ICML).

Lake, B. M., Salakhutdinov, R., and Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266):1332{1338.

Lake, B. M., Ullman, T. D., Tenenbaum, J. B., and Gershman, S. J. (2017). Building machines that learn and think like people. Behavioral and Brain Sciences, 40.

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature, 521(7553):436.

LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4):541{551.

Li, Y., Tarlow, D., Brockschmidt, M., and Zemel, R. (2016). Gated graph sequence neural networks.

In Proceedings of the International Conference on Learning Representations (ICLR).

Li, Y., Vinyals, O., Dyer, C., Pascanu, R., and Battaglia, P. (2018). Learning deep generative models of graphs. In Workshops at the International Conference on Learning Representations (ICLR).

Li, Y., Yu, R., Shahabi, C., and Liu, Y. (2017). Di usion convolutional recurrent neural network:

Data-driven tra c forecasting. arXiv preprint arXiv:1707.01926.

Lin, Z., Feng, M., Santos, C. N. d., Yu, M., Xiang, B., Zhou, B., and Bengio, Y. (2017). A structured self-attentive sentence embedding. In Proceedings of the International Conference on Learning Representations (ICLR).

Liu, H., Simonyan, K., Vinyals, O., Fernando, C., and Kavukcuoglu, K. (2018). Hierarchical representations for e cient architecture search. In Proceedings of the International Conference on Learning Representations (ICLR).

Luong, M.-T., Pham, H., and Manning, C. D. (2015). E ective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025.

Marcus, G. (2001). The algebraic mind.

Marcus, G. (2018a). Deep learning: A critical appraisal. arXiv preprint arXiv:1801.00631.

Marcus, G. (2018b). Innateness, alphazero, and arti cial intelligence. arXiv preprint arXiv:1801.05667.

McClelland, J. L. (1994). The interaction of nature and nurture in development: A parallel distributed processing perspective. International perspectives on psychological science, 1:57{88. McClelland, J. L. and Rumelhart, D. E. (1981). An interactive activation model of context e ects in letter perception: I. an account of basic ndings. Psychological Review, 88(5):375.

Mikolov, T., Yih, W.-t., and Zweig, G. (2013). Linguistic regularities in continuous space word representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 746{751.

Mitchell, T. M. (1980). The need for biases in learning generalizations. Department of Computer Science, Laboratory for Computer Science Research, Rutgers Univ. New Jersey.

Mnih, V., Heess, N., Graves, A., et al. (2014). Recurrent models of visual attention. In Advances in neural information processing systems, pages 2204{2212.

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540):529.

Monti, F., Boscaini, D., Masci, J., Rodola, E., Svoboda, J., and Bronstein, M. M. (2017). Geometric deep learning on graphs and manifolds using mixture model cnns. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Moravc k, M., Schmid, M., Burch, N., Lisy, V., Morrill, D., Bard, N., Davis, T., Waugh, K., Johanson, M., and Bowling, M. (2017). Deepstack: Expert-level arti cial intelligence in heads-up no-limit poker. Science, 356(6337):508{513.

Narayanan, A., Chandramohan, M., Chen, L., Liu, Y., and Saminathan, S. (2016). subgraph2vec: Learning distributed representations of rooted sub-graphs from large graphs. In Workshops at the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

Narayanan, A., Chandramohan, M., Venkatesan, R., Chen, L., Liu, Y., and Jaiswal, S. (2017).

graph2vec: Learning distributed representations of graphs. arXiv preprint arXiv:1707.05005.

Navon, D. (1977). Forest before trees: The precedence of global features in visual perception.

Cognitive Psychology, 9(3):353{383.

Niepert, M., Ahmed, M., and Kutzkov, K. (2016). Learning convolutional neural networks for graphs. In Proceedings of the International Conference on Machine Learning (ICML), pages 2014{2023.

Nilsson, N. J. and Fikes, R. E. (1970). Strips: A new approach to the application of theorem proving to problem solving. Technical report, SRI International, Menlo Park, CA Arti cial Intelligence Center.

Nowak, A., Villar, S., Bandeira, A. S., and Bruna, J. (2017). A note on learning algorithms for quadratic assignment with graph neural networks. In Proceedings of the Principled Approaches to Deep Learning Workshop (PADL) at the International Conference of Machine Learning (ICML).

Nowak, M. A. (2006). Five rules for the evolution of cooperation. science, 314(5805):1560{1563.

Ohtsuki, H., Hauert, C., Lieberman, E., and Nowak, M. A. (2006). A simple rule for the evolution of cooperation on graphs and social networks. Nature, 441(7092):502.

O~noro-Rubio, D., Niepert, M., Garc a-Duran, A., Gonzalez-Sanchez, R., and Lopez-Sastre, R. J. (2017). Representation learning for visual-relational knowledge graphs. arXiv preprint arXiv:1709.02314.

Parisotto, E., Mohamed, A.-r., Singh, R., Li, L., Zhou, D., and Kohli, P. (2017). Neuro-symbolic program synthesis. In Proceedings of the International Conference on Learning Representations (ICLR).

Pascanu, R., Li, Y., Vinyals, O., Heess, N., Buesing, L., Racaniere, S., Reichert, D., Weber, T., Wierstra, D., and Battaglia, P. (2017). Learning model-based planning from scratch. arXiv preprint arXiv:1707.06170.

Pearl, J. (1986). Fusion, propagation, and structuring in belief networks. Arti cial intelligence, 29(3):241{288.

Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference.

Morgan Kaufmann.

Pearl, J. (2009). Causality: Models, Reasoning and Inference. Cambridge University Press, New York, NY, USA, 2nd edition.

Pearl, J. (2018). Theoretical impediments to machine learning with seven sparks from the causal revolution. arXiv preprint arXiv:1801.04016.

Pennington, J., Socher, R., and Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532{1543.

Perozzi, B., Al-Rfou, R., and Skiena, S. (2014). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 701{710. ACM.

Pevny, T. and Somol, P. (2017). Using neural network formalism to solve multiple-instance problems.

In International Symposium on Neural Networks, pages 135{142. Springer.

Pinker, S. and Prince, A. (1988). On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition, 28(1-2):73{193.

Plate, T. A. (1995). Holographic reduced representations. IEEE Transactions on Neural Networks, 6(3):623{641.

Plaut, D. C., McClelland, J. L., Seidenberg, M. S., and Patterson, K. (1996). Understanding normal and impaired word reading: computational principles in quasi-regular domains. Psychological Review, 103(1):56.

Pollack, J. B. (1990). Recursive distributed representations. Arti cial Intelligence, 46(1-2):77{105.

Qi, C. R., Su, H., Mo, K., and Guibas, L. J. (2017). Pointnet: Deep learning on point sets for 3d classi cation and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Raposo, D., Santoro, A., Barrett, D., Pascanu, R., Lillicrap, T., and Battaglia, P. (2017). Discovering objects and their relations from entangled scene representations. In Workshops at the International Conference on Learning Representations (ICLR).

Reed, S. and De Freitas, N. (2016). Neural programmer-interpreters. In Proceedings of the International Conference on Learning Representations (ICLR).

Ritchie, D., Horsfall, P., and Goodman, N. D. (2016). Deep amortized inference for probabilistic programs. arXiv preprint arXiv:1610.05735.

Rosenblatt, F. (1961). Principles of neurodynamics. perceptrons and the theory of brain mechanisms.

Technical report, Cornell Aeronautical Lab Inc., Bu alo, NY.

Rumelhart, D. E., McClelland, J. L., Group, P. R., et al. (1987). Parallel Distributed Processing, volume 1. MIT Press.

Russell, S. J. and Norvig, P. (2009). Arti cial Intelligence: A Modern Approach (3rd Edition).

Pearson.

Sabour, S., Frosst, N., and Hinton, G. E. (2017). Dynamic routing between capsules. In Advances in Neural Information Processing Systems, pages 3859{3869.

Sanchez-Gonzalez, A., Heess, N., Springenberg, J. T., Merel, J., Riedmiller, M., Hadsell, R., and Battaglia, P. (2018). Graph networks as learnable physics engines for inference and control. In Proceedings of the 35th International Conference on Machine Learning (ICLR).

Santoro, A., Raposo, D., Barrett, D. G., Malinowski, M., Pascanu, R., Battaglia, P., and Lillicrap, T. (2017). A simple neural network module for relational reasoning. In Advances in Neural Information Processing Systems.

Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., and Monfardini, G. (2009a). Computational capabilities of graph neural networks. IEEE Transactions on Neural Networks, 20(1):81{102.

Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., and Monfardini, G. (2009b). The graph neural network model. IEEE Transactions on Neural Networks, 20(1):61{80.

Scarselli, F., Yong, S. L., Gori, M., Hagenbuchner, M., Tsoi, A. C., and Maggini, M. (2005). Graph neural networks for ranking web pages. In Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, pages 666{672. IEEE.

Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61:85{117.

Selsam, D., Lamm, M., Bunz, B., Liang, P., de Moura, L., and Dill, D. L. (2018). Learning a sat solver from single-bit supervision. arXiv preprint arXiv:1802.03685.

Shalev-Shwartz, S., Shamir, O., and Shammah, S. (2017). Failures of gradient-based deep learning.

arXiv preprint arXiv:1703.07950.

Shaw, P., Uszkoreit, J., and Vaswani, A. (2018). Self-attention with relative position representations. In Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.

Shervashidze, N., Schweitzer, P., Leeuwen, E. J. v., Mehlhorn, K., and Borgwardt, K. M. (2011).

Weisfeiler-lehman graph kernels. Journal of Machine Learning Research, 12(Sep):2539{2561.

Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484{489.

Smolensky, P. (1990). Tensor product variable binding and the representation of symbolic structures in connectionist systems. Arti cial Intelligence, 46(1-2):159{216.

Socher, R., Huval, B., Manning, C. D., and Ng, A. Y. (2012). Semantic compositionality through recursive matrix-vector spaces. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing (EMNLP) and Computational Natural Language Learning (CNLL), pages 1201{1211. Association for Computational Linguistics.

Socher, R., Lin, C. C., Manning, C., and Ng, A. Y. (2011a). Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 28th International Conference on Machine Learning (ICML), pages 129{136.

Socher, R., Pennington, J., Huang, E. H., Ng, A. Y., and Manning, C. D. (2011b). Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 151{161. Association for Computational Linguistics.

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A., and Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1631{1642.

Spelke, E. S., Breinlinger, K., Macomber, J., and Jacobson, K. (1992). Origins of knowledge.

Psychological review, 99(4):605.

Spelke, E. S. and Kinzler, K. D. (2007). Core knowledge. Developmental Science, 10(1):89{96.

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from over tting. The Journal of Machine Learning Research, 15(1):1929{1958.

Sukhbaatar, S., Fergus, R., et al. (2016). Learning multiagent communication with backpropagation.

In Advances in Neural Information Processing Systems, pages 2244{2252.

Sukhbaatar, S., Weston, J., Fergus, R., et al. (2015). End-to-end memory networks. In Advances in Neural Information Processing Systems, pages 2440{2448.

Sutskever, I., Vinyals, O., and Le, Q. V. (2014). Sequence to sequence learning with neural networks.

In Advances in Neural Information Processing Systems, pages 3104{3112.

Szegedy, C., Io e, S., Vanhoucke, V., and Alemi, A. A. (2017). Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI Conference on Arti cial Intelligence (AAAI), volume 4, page 12.

Tai, K. S., Socher, R., and Manning, C. D. (2015). Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).

Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., and Mei, Q. (2015). Line: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web, pages 1067{1077. International World Wide Web Conferences Steering Committee.

Tenenbaum, J. B., Gri ths, T. L., and Kemp, C. (2006). Theory-based bayesian models of inductive learning and reasoning. Trends in Cognitive Sciences, 10(7):309{318.

Tenenbaum, J. B., Kemp, C., Gri ths, T. L., and Goodman, N. D. (2011). How to grow a mind:

Statistics, structure, and abstraction. Science, 331(6022):1279{1285.

Toyer, S., Trevizan, F., Thiebaux, S., and Xie, L. (2017). Action schema networks: Generalised policies with deep learning. In Proceedings of the AAAI Conference on Arti cial Intelligence (AAAI).

Ullman, T. D., Spelke, E., Battaglia, P., and Tenenbaum, J. B. (2017). Mind games: Game engines as an architecture for intuitive physics. Trends in Cognitive Sciences, 21(9):649{665.

van Steenkiste, S., Chang, M., Gre , K., and Schmidhuber, J. (2018). Relational neural expectation maximization: Unsupervised discovery of objects and their interactions. Proceedings of the International Conference on Learning Representations (ICLR).

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems.

Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y. (2018). Graph attention networks. In Proceedings of the International Conference on Learning Representations (ICLR).

Wang, J. X., Kurth-Nelson, Z., Kumaran, D., Tirumala, D., Soyer, H., Leibo, J. Z., Hassabis, D., and Botvinick, M. (2018a). Prefrontal cortex as a meta-reinforcement learning system. Nature neuroscience, page 1.

Wang, J. X., Kurth-Nelson, Z., Tirumala, D., Soyer, H., Leibo, J. Z., Munos, R., Blundell, C., Kumaran, D., and Botvinick, M. (2016). Learning to reinforcement learn. arXiv preprint arXiv:1611.05763.

Wang, T., Liao, R., Ba, J., and Fidler, S. (2018b). Nervenet: Learning structured policy with graph neural networks. In Proceedings of the International Conference on Learning Representations (ICLR).

Wang, X., Girshick, R., Gupta, A., and He, K. (2018c). Non-local neural networks. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR).

Wang, Y., Sun, Y., Liu, Z., Sarma, S. E., Bronstein, M. M., and Solomon, J. M. (2018d). Dynamic graph cnn for learning on point clouds. arXiv preprint arXiv:1801.07829.

Watters, N., Zoran, D., Weber, T., Battaglia, P., Pascanu, R., and Tacchetti, A. (2017). Visual interaction networks: Learning a physics simulator from video. In Advances in Neural Information Processing Systems, pages 4542{4550.

Wu, J., Lu, E., Kohli, P., Freeman, B., and Tenenbaum, J. (2017). Learning to see physics via visual de-animation. In Advances in Neural Information Processing Systems, pages 152{163.

Yoon, K., Liao, R., Xiong, Y., Zhang, L., Fetaya, E., Urtasun, R., Zemel, R., and Pitkow, X. (2018). Inference in probabilistic graphical models by graph neural networks. In Workshops at the International Conference on Learning Representations (ICLR).

You, J., Ying, R., Ren, X., Hamilton, W. L., and Leskovec, J. (2018). GraphRNN: A deep generative model for graphs. arXiv preprint arXiv:1802.08773.

Yuille, A. L. and Liu, C. (2018). Deep nets: What have they ever done for vision? arXiv preprint arXiv:1805.04025.

Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R. R., and Smola, A. J. (2017).

Deep sets. In Advances in Neural Information Processing Systems, pages 3394{3404.

Zambaldi, V., Raposo, D., Santoro, A., Bapst, V., Li, Y., Babuschkin, I., Tuyls, K., Reichert, D., Lillicrap, T., Lockhart, E., Shanahan, M., Langston, V., Pascanu, R., Botvinick, M., Vinyals, O., and Battaglia, P. (2018). Relational deep reinforcement learning. arXiv preprint arXiv.

Zhang, A., Lerer, A., Sukhbaatar, S., Fergus, R., and Szlam, A. (2018). Composable planning with attributes. arXiv preprint arXiv:1803.00512.

Zugner, D., Akbarnejad, A., and Gunnemann, S. (2018). Adversarial Attacks on Neural Networks for Graph Data. arXiv preprint arXiv:1805.07984.

Appendix: Formulations of additional models

In this appendix we give more examples of how published networks can t in the frame de ned by Equation 1.

Interaction networks

Interaction Networks (Battaglia et al., 2016; Watters et al., 2017) and the Neural Physics Engine Chang et al. (2017) use a full GN but for the absence of the global to update the edge properties:
在本附录中，我们给出了更多的示例，说明已发布的网络在等式1所定义的框架中是如何无法实现的。
交互网络
交互网络（Battaglia等人，2016；Watters等人，2017）和神经物理引擎Chang等人。（2017年）使用完整的gn，但如果缺少全局更新边缘属性：

Non-pairwise interactions

Attention-based approaches

Belief Propagation embeddings

Finally, we briey summarize how the general \structure2vec" algorithm of Dai et al. (2016) can t into our framework. In order to do so, we need to slightly modify our main Equation 1, i.e.:
最后，我们简要总结了Dai等人的一般\结构2vec“算法。（2016）无法进入我们的框架。为了做到这一点，我们需要稍微修改我们的主方程1，即：

Edges’ features now takes the meaning of \message" between their receiver and sender; note that there is only one set of parameters to learn for both the edges and nodes updates.

边缘的功能现在在接收端和发送端之间具有\message“的含义；请注意，对于边缘和节点更新，只有一组参数需要学习。