论文下载地址: https://doi.org/10.1145/3292500.3330989
发表期刊:KDD
Publish time: 2019
作者及单位:
数据集:
代码:
其他人写的文章
简要概括创新点:
- (1) In this work, we investigate the utility of knowledge graph (KG), which breaks down the independent interaction assumption by linking items with their attributes. (在这项工作中,我们研究了知识图谱(KG)的效用,它通过将项目与其属性联系起来,打破了独立交互的假设。)
- (2) In this work, we explore high-order connectivity with semantic relations in CKG for knowledge-aware recommendation. (在这项工作中,我们探索了CKG中语义关系的高阶连通性,以实现知识感知推荐。)
- (3) We propose a new method named Knowledge Graph Attention Network (KGAT) which explicitly models the high-order connectivities in KG in an end-to-end fashion. It recursively propagates the embeddings from a node’s neighbors (which can be users, items, or attributes) to refine the node’s embedding, and employs an attention mechanism to discriminate the importance of the neighbors. (我们提出了一种新的方法,称为知识图注意网络(KGAT),它以端到端的方式显式地建模KG中的高阶连通性。它递归地从节点的邻居(可以是用户、项目或属性)传播嵌入以细化节点的嵌入,并使用注意机制来区分邻居的重要性。)
- (4) KGAT is equipped with two designs to correspondingly address the challenges in high-order relation modeling: (该方法配备了两种设计,以相应地解决高阶关系建模中的挑战:)
- (1) recursive embedding propagation, which updates a node’s embedding based on the embeddings of its neighbors, and recursively performs such embedding propagation to capture high-order connectivities in a linear time complexity; and (递归嵌入传播,它基于节点邻居的嵌入更新节点的嵌入,并递归地执行这种嵌入传播以捕获线性时间复杂度中的高阶连通性)
- (2) attention-based aggregation, which employs the neural attention mechanism [6, 27] to learn the weight of each neighbor during a propagation, such that the attention weights of cascaded propagations can reveal the importance of a high-order connectivity. (基于注意的聚合,它利用神经注意机制[6,27]来学习传播过程中每个邻居的权重,这样级联传播的注意权重可以揭示高阶连接性的重要性。)
- (5) Figure 2 shows the model framework, which consists of three main components:
- (1) embedding layer, which parameterizes each node as a vector by preserving the structure of CKG; (嵌入层,通过保留CKG的结构将每个节点参数化为一个向量;)
- (2) attentive embedding propagation layers, which recursively propagate embeddings from a node’s neighbors to update its representation, and employ knowledge-aware attention mechanism to learn the weight of each neighbor during a propagation; (注意嵌入传播层,该层递归地传播来自节点邻居的嵌入以更新其表示,并且在传播过程中使用知识感知注意机制来学习每个邻居的权重;)
- and (3) prediction layer, which aggregates the representations of a user and an item from all propagation layers, and outputs the predicted matching score. (预测层,该层聚合来自所有传播层的用户和项目的表示,并输出预测的匹配分数)
- At it core is the attentive embedding propagation layer, which adaptively propagates the embeddings from a node’s neighbors to update the node’s epresentation. (其核心是注意力嵌入传播层,该层自适应地传播来自节点邻居的嵌入,以更新节点的epresentation。)
• Information systems → Recommender systems.
(1) The success of recommendation system makes it prevalent in Web applications, ranging from search engines, E-commerce, to social media sites and news portals — without exaggeration, almost every service that provides content to users is equipped with a recommendation system. To predict user preference from the key (and widely available) source of user behavior data, much research effort has been devoted to collaborative filtering (CF) [12, 13, 32]. (推荐系统的成功使其在从搜索引擎、电子商务到社交媒体网站和新闻门户网站的网络应用程序中普遍存在——毫不夸张地说,几乎所有向用户提供内容的服务都配备了推荐系统。为了从用户行为数据的关键(且广泛可用)来源预测用户偏好,协作过滤(CF)已经投入了大量研究工作[12,13,32]。)
(2) Although these methods have provided strong performance, a deficiency is that they model each interaction as an independent data instance and do not consider their relations. This makes them insufficient to distill attribute-based collaborative signal from the collective behaviors of users. (虽然这些方法提供了很强的性能,但缺点是它们将每个交互建模为独立的数据实例,而不考虑它们之间的关系。这使得它们不足以从用户的集体行为中提取基于属性的协作信号。)
(3) To address the limitation of feature-based SL models, a solution is to take the graph of item side information, a k a aka aka. knowledge graph(A KG is typically described as a heterogeneous network consisting of entity-relation-entity triplets, where the entity can be an item or an attribute.)[3, 4], into account to construct the predictive model. We term the hybrid structure of knowledge graph and user-item graph as collaborative knowledge graph (CKG). As illustrated in Figure 1, the key to successful recommendation is to fully exploit the high-order relations in CKG, e.g., the long-range connectivities: (为了解决基于特征的SL模型的局限性,一个解决方案是采用项目辅助信息图 aka知识图谱(KG通常被描述为由实体-关系-实体三元组组成的异构网络,其中实体可以是项目或属性)[3,4],用于构建预测模型。我们将知识图谱和用户项目图的混合结构称为协作知识图(CKG)。如图1所示,成功推荐的关键是充分利用CKG中的高阶关系,例如远程连接性)
(4) Several recent efforts have attempted to leverage the CKG structure for recommendation, which can be roughly categorized
into two types, path-based [14, 25, 29, 33, 37, 39] and regularization-based [5, 15, 33, 38]: (最近几次尝试利用CKG结构进行推荐,推荐大致分为两种类型,基于路径的[14,25,29,33,37,39]和基于正则化的[5,15,33,38]:)
(5) Considering the limitations of existing solutions, we believe it is of critical importance to develop a model that can exploit high-orderinformation in KG in an efficient, explicit, and end-to-end manner. (考虑到现有解决方案的局限性,我们认为开发一个能够以高效、明确和端到端的方式利用KG中的高阶信息的模型至关重要。)
(6) The contributions of this work are summarized as follows:
We now formulate the recommendation task to be addressed in this paper:
Figure 2: Illustration of the proposed KGAT model. The left subfigure shows model framework of KGAT, and the right subfigure presents the attentive embedding propagation layer of KGAT.
(1) Knowledge graph embedding is an effective way to parameterize entities and relations as vector representations, while preserving the graph structure. (知识图谱嵌入是一种将实体和关系参数化为向量表示的有效方法,同时保留了图的结构。)
(2) The training of TransR considers the relative order between valid triplets and broken ones, and encourages their discrimination through a pairwise ranking loss: (TransR的训练考虑了有效三胞胎和破碎三胞胎之间的相对顺序,并通过两两排序损失鼓励它们的区分)
(1) The final phase is to aggregate the entity representation e h e_h eh and its ego-network representations e N h e_{N_h} eNh as the new representation of entity h h h — more formally, e h ( 1 ) = f ( e h , e N h ) e^{(1)}_h = f (e_h, e_{N_h}) eh(1)=f(eh,eNh). We implement f ( ⋅ ) f(\cdot) f(⋅) using the following three types of aggregators:
(2)GCN Aggregator [17] sums two representations up and applies a nonlinear transformation, as follows: (GCN聚合器[17]将两种表示相加,并应用非线性变换,如下所示)
(3) GraphSage Aggregator [9] concatenates two representations, followed by a nonlinear transformation: (GraphSage聚合器[9]连接两种表示,然后进行非线性变换)
(4) Bi-Interaction Aggregator is carefully designed by us to consider two kinds of feature interactions between e h e_h eh and e N h e_{N_h} eNh, as follows:
(5) To summarize, the advantage of the embedding propagation layer lies in explicitly exploiting the first-order connectivity information to relate user, item, and knowledge entity representations. We empirically compare the three aggregators in Section 4.4.2. (总之,嵌入传播层的优势在于显式地利用一阶连接性信息来关联用户、项目和知识实体表示。我们在第4.4.2节中对三个聚合器进行了经验比较。)
(1) To optimize the recommendation model, we opt for the BPR loss [22]. Specifically, it assumes that the observed interactions, which indicate more user preferences, should be assigned higher prediction values than unobserved ones:
(2) Finally, we have the objective function to learn Equations (2) and (13) jointly, as follows:
We evaluate our proposed method, especially the embedding propagation layer, on three real-world datasets. We aim to answer the following research questions:
Amazon-review is a widely used dataset for product recommendation [10]. We select Amazon-book from this collection. To ensure the quality of the dataset, we use the 10-core setting, i.e., retaining users and items with at least ten interactions. (Amazon review是一个广泛使用的产品推荐数据集[10]。我们从这个系列中选择亚马逊图书。为了确保数据集的质量,我们使用了10个核心设置,即保留至少10个交互的用户和项目。)
This is the music listening dataset collected from Last.fm online music systems. Wherein, the tracks are viewed as the items. In particular, we take the subset of the dataset where the timestamp is from Jan, 2015 to June, 2015. We use the same 10-core setting in
order to ensure data quality. (这是从去年收集的音乐收听数据集。调频在线音乐系统。其中,轨迹被视为项目。特别是,我们选取时间戳为2015年1月至2015年6月的数据集子集。我们使用相同的10核设置,以确保数据质量。)
This dataset is adopted from the 2018 edition of the Yelp challenge. Here we view the local businesses like restaurants and bars as the items. Similarly, we use the 10-core setting to ensure that each user and item have at least ten interactions. (该数据集取自2018年版的Yelp挑战赛。在这里,我们将餐馆和酒吧等当地企业视为商品。类似地,我们使用10个核心设置来确保每个用户和项目至少有10个交互。)
To get deep insights on the attentive embedding propagation layer of KGAT, we investigate its impact. We first study the influence of layer numbers. In what follows, we explore how different aggregators affect the performance. We then examine the influence of knowledge graph embedding and attention mechanism. (为了深入了解KGAT的专注嵌入传播层,我们研究了它的影响。我们首先研究层数的影响。接下来,我们将探讨不同的聚合器如何影响性能。然后,我们研究了知识图嵌入和注意机制的影响。)
This research is part of NExT++ research and also supported by the Thousand Youth Talents Program 2018. NExT++ is supported by the National Research Foundation, Prime Minister’s Office, Singapore under its IRC@SG Funding Initiative.