论文下载地址: https://doi.org/10.1145/3437963.3441738
发表期刊:WSDM
Publish time: 2021
作者及单位:
数据集:
代码:
其他人写的文章
简要概括创新点: 这篇论文,理论的针对点。冷启动用户的embedding是inaccurate。训练时用的有丰富交互的数据,ground-truth和cold-start user/item是author随机采样模拟得到的;训练好了,再用到冷启动的数据上
- (1)However, the basic pre-training GNN model doesn’t specially address the cold-start neighbors. During the original graph convolution process, the inaccurate embeddings of the cold-start neighbors and the embeddings of other neighbors are equally treated and aggregated to represent the target user/item. (然而,基本的预训练GNN模型并没有专门针对冷启动邻居。在原始的图卷积过程中,冷启动邻域的不准确嵌入和其他邻域的嵌入被平等地处理和聚合,以表示目标用户/项。) 这篇论文,理论的针对点。冷启动用户的embedding是inaccurate
- (2)This paper proposes to pretrain a GNN model before applying it for recommendation. (本文建议在应用GNN模型进行推荐之前对其进行预训练。)
- (3)To further reduce the impact from the cold-start neighbors,
- we incorporate a self-attention-based meta aggregator to enhance the aggregation ability of each graph convolution step, (为了进一步减少冷启动邻居的影响,我们加入了一个基于自注意的元聚合器来增强每个图卷积步骤的聚合能力)
- and an adaptive neighbor sampler to select the effective neighbors according to the feedbacks from the pre-training GNN model.(以及一个自适应邻居采样器来根据预训练GNN模型的反馈 选择有效邻居。)
- (4)Since we also need ground truth embeddings of the cold-start users/items to learn f f f , we simulate those users/items from the target users/items with abundant interactions. (由于我们还需要冷启动用户/项目的真实值嵌入来学习 f f f,因此我们模拟了目标用户/项目中具有丰富交互的用户/项目。)
• Information systems → Social recommendation;
Pre-training, graph neural networks, cold-start, recommendation
(1)Recommendation systems [14, 21] have been extensively deployed to alleviate information overload in various web services, such as social media, E-commerce websites and news portals. To predict the likelihood of a user adopting an item, collaborative filtering (CF) is the most widely adopted principle. The most common paradigm for CF, such as matrix factorization [21] and neural collaborative filtering [14], is to learn embeddings, i.e. the preferences for users and items and then perform the prediction based on the embeddings [13]. However, these models fail to learn high-quality embeddings for the cold-start users/items with sparse interactions. (推荐系统[14,21]已被广泛部署,以缓解各种网络服务(如社交媒体、电子商务网站和新闻门户)中的信息过载。为了预测用户采用某个项目的可能性,协同过滤(CF)是最广泛采用的原则。CF最常见的范例,如矩阵分解[21]和神经协同过滤[14],是学习嵌入,即用户和项目的偏好,然后根据嵌入进行预测[13]。然而,这些模型无法为交互稀少的冷启动用户/项目学习高质量的嵌入。)
(2)To address the cold-start problem, traditional recommender systems incorporate the side information such as content features of users and items [40,44] or external knowledge graphs (KGs) [35,37] to compensate the low-quality embeddings caused by sparse interactions. However, the content features are not always available, and it is not easy to link the items to the entities in KGs due to the incompleteness and ambiguation of the entities. (为了解决冷启动问题,传统的推荐系统结合了用户和项目的内容特征[40,44]或外部知识图(KG)[35,37]等辅助信息,以补偿稀疏交互导致的低质量嵌入。然而,内容功能并不总是可用的,而且由于实体的不完整性和模糊性,将项目链接到KGs中的实体并不容易。)
(3) On another line, inspired by the recent development of graph neural networks (GNNs) [2, 11, 19], NGCF [38] and LightGCN [13] encode the high-order collaborative signal in the user-item interaction graph by a GNN model, based on which they perform the recommendation task. As shown in Fig. 1, a typical recommendation-oriented GNN conducts graph convolution on the local neighborhood’s embeddings of u 1 u_1 u1 and i 1 i_1 i1. Through iteratively repeating the convolution by multiple steps, the embeddings of the high-order neighbors are propagated to u 1 u1 u1 and i 1 i1 i1. Based on the aggregated embeddings of u 1 u_1 u1 and i 1 i_1 i1, the likelihood of u 1 u_1 u1 adopting i 1 i_1 i1 is estimated, and cross-entropy loss [3] or BPR loss [13, 38] is usually adopted to compare the likelihood and the true observations. (另一方面,受图形神经网络(GNN)[2,11,19]的最新发展启发,NGCF[38]和LightGCN[13]通过GNN模型对用户项交互图中的高阶协作信号进行编码,并在此基础上执行推荐任务。如图1所示,典型的面向推荐的GNN对 u 1 u_1 u1和 i 1 i_1 i1的局部邻域嵌入进行图卷积通过多次迭代重复卷积,高阶邻域的嵌入被传播到 u 1 u_1 u1和 i 1 i_1 i1。基于 u 1 u_1 u1和 i 1 i_1 i1的聚合嵌入, u 1 u_1 u1采纳 i 1 i_1 i1的可能性被评估,通常采用 交叉熵损失[3]或 BPR损失[13,38]来比较可能性和真实观测值。)
(4)Despite the success of capturing the high-order collaborative signal in GNNs [13, 38], the cold-start problem is not thoroughly solved by them. (尽管成功地捕获了GNNs中的高阶协同信号[13,38],但冷启动问题并没有被它们彻底解决。
(5) Present work. To tackle the above challenges, before performing the GNN model for recommendation,
(6) However, the above pre-training strategy still can not explicitly deal with the high-order cold-start neighbors when performing graph convolution. Besides, previous GNN sampling strategies such as random or importance sampling strategies may fail to sample high-order relevant cold-start neighbors due to their sparse interactions. (然而,在执行图卷积时,上述预训练策略仍然不能明确地处理高阶冷启动邻居。此外,以往的GNN抽样策略,如随机抽样或重要抽样策略,由于其稀疏的交互作用,可能无法对高阶相关冷启动邻居进行抽样。)
(7)The contributions of this work are as follows:
(8) Experiments on both intrinsic embedding evaluation task and extrinsic downstream recommendation task demonstrate the superiority of our proposed pre-training GNN model against the state-of-the-art GNN models. (通过对内在嵌入评估任务和外在下游推荐任务的实验,证明了我们提出的训练前GNN模型相对于最先进的GNN模型的优越性。)
(1)In this section, we first define the problem and then introduce the graph neural networks that can be used to solve the problem.
(2) We formalize the user-item interaction data for recommendation as a bipartite graph denoted as G = ( U , I , E ) G = (U,I,E) G=(U,I,E),
(3) We use N ( u ) l N^l_{(u)} N(u)l to represent the l l l-order neighbors of user u u u. When ignoring the superscript, N ( u ) N (u) N(u) indicates the first-order neighbors of u u u. Similarly, N ( i ) l N^l_{(i)} N(i)l and N ( i ) N (i) N(i) are defined for items.
(4) Let f : U ∪ V → R b d f : U \cup V \to R^bd f:U∪V→Rbd be the encoding function that maps the users/items to d d d-dimension real-valued vectors. We use h u h_u hu and h i h_i hi to denote the embedding of user u u u and item i i i respectively. Given a bipartite graph G G G, we aim to pre-train the encoding function f f f that is able to be applied on the downstream recommendation task to improve its performance. In the following sections, we mainly take user embedding as an example to explain the proposed model. Item embedding can be explained in the same way. (是将用户/项目映射到 d d d维实值向量的编码函数。我们用 h u h_u hu还有 h i h_i hi分别表示用户 u u u和项目 i i i的嵌入。在给定二部图 G G G的情况下,我们的目标是对编码函数 f f f进行预训练,使其能够应用于下游推荐任务,以提高其性能。在接下来的部分中,我们主要以用户嵌入为例来解释所提出的模型。项目嵌入可以用同样的方式解释。)
This section introduces the proposed pre-training GNN model to learn the embeddings for the cold-start users and items. (本节介绍了拟议的培训前GNN模型,以了解冷启动用户和项目的嵌入。)
(1)We propose the Meta Aggregator to deal with the cold-start neighbors.
(2)Suppose the target node is u u u and one of its neighbor is i i i, if i i i is interacted with sparse nodes, its embedding, which is inaccurate, will affect the embedding of u u u when performing graph convolution by the GNN f f f . Although the cold-start issue of i i i is dealt with when i i i acts as another target node, embedding i i i, which is parallel to embedding u u u, results in a delayed effect on u’ embedding. Thus, before training the GNN f f f , we train another function g g g under the similar meta-learning setting as f f f . The meta learner g g g learns an additional embedding for each node only based on its first-order neighbors, thus it can quickly adapt to new cold-start nodes and produce more accurate embeddings for them. The embedding produced by g g g is combined with the original embedding at each convolution in f f f . Although both f f f and g g g are trained under the same meta-learning setting, ** f f f is to tackle the cold-start target ndoes, but g g g is to enhance the cold-start neighbors’ embeddings. ** (假设目标节点是 u u u,其一个邻居是 i i i,如果 i i i与稀疏节点交互,其嵌入不准确,将影响GNN f f f执行图卷积时 u u u的嵌入。虽然当 i i i作为另一个目标节点时, i i i的冷启动问题会得到解决,但嵌入 i i i(与嵌入 u u u平行)会导致 u u u’嵌入的延迟效应。因此,在训练GNN f f f之前,我们在与 f f f类似的元学习设置下训练另一个函数 g g g。元学习器 g g g仅基于每个节点的一阶邻居学习一个额外的嵌入,因此它可以快速适应新的冷启动节点,并为它们生成更精确的嵌入。 g g g生成的嵌入与 f f f中每个卷积处的原始嵌入相结合。虽然 f f f和 g g g都是在相同的元学习环境下训练的,但 f f f是为了解决冷启动目标ndoes,而 g g g是为了增强冷启动邻居的嵌入。)
(3)Specifically, we instantiate g g g as a self-attention encoder [30]. (具体来说,我们将 g g g实例化为一个自我关注编码器[30])
(4)The self-attention technique, which pushes the dissimilar neighbors further apart and pulls the similar neighbors closer together, can capture the major preference of the nodes from its neighbors. The same cosine similarity described in Eq.(4) is used as the loss function to measure the difference between the predicted meta embedding h ~ u \tilde{h}u h~u and the ground truth embeding h u h_u hu. Once g g g is learned, we add the meta embedding h ~ u \tilde{h}u h~u into each graph convolution step of the GNN f f f in Eq. (1): (自我注意技术将不同的邻居进一步分开,将相似的邻居拉近,可以从邻居那里捕获节点的主要偏好。使用等式(4)中描述的相同余弦相似性作为损失函数,以测量预测的元嵌入 h ~ u \tilde{h}u h~u 和基础真值嵌入 h u h_u hu之间的差异 .一旦学习了 g g g,我们将元嵌入 h ~ u \tilde{h}u h~u添加到等式(1)中GNN f f f的每个图卷积步骤中)
(5)For a target user u u u, Eq.(6) is repeated L L L-1 steps to obtain the embeddings { h 1 L − 1 , ⋅ ⋅ ⋅ , h K L − 1 } \{h^{L−1}_1 ,· · · ,h^{L−1}_K\} {h1L−1,⋅⋅⋅,hKL−1} for its K K K first-order neighbors,
(1)The proposed sampler does not make any assumption about what kind of neighbors are useful for the target users/items. Instead, it learns an adaptive sampling strategy according to the feedbacks from the pre-training GNN model. (提出的采样器没有假设什么样的邻居对目标用户/项目有用。相反,它根据预训练GNN模型的反馈学习自适应采样策略。)
(2)To achieve this goal, we cast the task of neighbor sampler as a hierarchical Markov Decision Process (MDP) [28, 47].
(3)State. The l l l-th subtask takes an action at the t t t-th l l l-order neighbor to determine whether to sample it or not according to the state of the target user u u u, the formerly selected neighbors, and the t t t-th l l l-order neighbor to be determined. We define the state features s t l s^l_t stl for the t-th l-order neighbor as the cosine similarity and the element-wise product between its initial embedding and the target user u u u’s initial embedding, the initial embedding of each formerly selected neighbor by the l l l-1-th subtask and the average embedding of all the formerly selected neighbors respectively. (状态. l l l-th子任务在t-th-l-order邻居处执行操作,以根据目标用户 u u u、先前选择的邻居和要确定的t-th-l-order邻居的状态来确定是否对其进行采样。我们定义了状态特征 s t l s^l_t stl对于作为余弦相似度的第 t t t个 l l l阶邻居,以及其初始嵌入和目标用户u的初始嵌入之间的元素乘积,分别通过 l l l-1子任务对每个先前选择的邻居的初始嵌入和所有先前选择的邻居的平均嵌入。)
(4)Action and Policy. We define the action a t l ∈ { 0 , 1 } a^l_t \in \{0,1\} atl∈{0,1} for the t t t-th l l l-order neighbor as a binary value to represent whether to sample the neighbor or not. We perform a t l a^l_t atl by the policy function P P P: (第 t t t个 l l l阶邻居为二进制值,以表示是否对邻居进行采样)
(5) Reward. The reward is a signal to indicate whether the performed actions are reasonable or not. Suppose the sampling task is finished at the l l l′-th subtask, each action of the formerly performed l l l′subtasks accepts a delayed reward after the last action of the l l l′-level subtask. In another word, the immediate reward for an action is zero except the last action. The reward is formulated as: (奖励是一个信号,表明所采取的行动是否合理。假设采样任务在l′th子任务完成,之前执行的l′子任务的每个动作在l′level子任务的最后一个动作之后接受延迟奖励。换句话说,除了最后一个动作外,一个动作的即时奖励为零。奖励的形式如下:)
(6)Objective Function. We find the optimal parameters of the policy function defined in Eq. (7) by maximizing the expected reward ∑ τ P ( τ ; Θ s ) R ( τ ) \sum_{\tau} P(τ;\Theta_s)R(\tau) ∑τP(τ;Θs)R(τ),
(7)Since there are too many possible action-state trajectories for the entire sequence, we adopt the monto-carlo policy gradient [39] to sample M M M action-state trajectories and calculate the gradients: (由于整个序列有太多可能的动作状态轨迹,我们采用monto carlo策略梯度[39]对M动作状态轨迹进行采样并计算梯度:)
(8) Algorithm 1 shows the training process of the adaptive neighbor sampler. At each step l l l, we sample a sequence of actions A l A^l Al(Line 5). If all the actions at the l l l-th step equal to zero or the last L L L-th step is performed (Line 6), the whole task is finished, then we compute the reward (Line 7) and the gradients (Line 8). After an epoch of sampling, we update the parameters of the sampler (Line 10). If it is jointly trained with the meta learner and the meta aggregator, we also update their parameters (Line 12). (算法1显示了自适应邻居采样器的训练过程。在每个步骤 l l l中,我们对一系列动作 a l a^l al(第5行)进行采样。如果第11步的所有动作都等于零,或者最后一个第11步的动作都完成了(第6行),整个任务就完成了,那么我们计算奖励(第7行)和梯度(第8行)。采样一段时间后,我们更新采样器的参数(第10行)。如果它与元学习者和元聚合器联合训练,我们也会更新它们的参数(第12行)。)
We evaluate on three public datasets including MovieLens-1M (Ml-1M)3[12], MOOCs4[47] and Last.fm5. Table 1 illustrates the statistics of these datasets. The code is available now.
(1)We select three types of baselines including the state-of-the-art neural matrix factorization model, the general GNN models and the special GNN models for recommendation: (我们选择了三种类型的基线,包括最先进的神经矩阵分解模型、通用GNN模型和特殊GNN模型,以供推荐:)
(2)For each GNN model, we evaluate the corresponding pre-training model.
(3)The original GAT and LightGCN models use the whole adjacency matrix, i.e., all the neighbors, in the aggregation function. To train them more efficiently, we implement them in the same sampling way as GraphSAGE, where we randomly sample at most 10 neighbors for each user/item. Then the proposed pre-training GNN model is performed under the sampled graph. (原始的GAT和LightGCN模型在聚合函数中使用整个邻接矩阵,即所有邻居。为了更有效地训练它们,我们采用与GraphSAGE相同的采样方式实现它们,在GraphSAGE中,我们为每个用户/项目随机采样最多10个邻居。然后,在采样图下执行所提出的预训练GNN模型。)
In this section, we conduct the intrinsic evaluation of inferring the embeddings of cold-start users/items by the proposed pre-training GNN model. Both the evaluations on the user embedding inference and the item embedding inference are performed. (在本节中,我们通过提出的预训练GNN模型对推断冷启动用户/项目的嵌入进行内在评估。对用户嵌入推理和项目嵌入推理进行评估。)
(1) We use the meta-training set D T D_T DT to perform the intrinsic evaluation. (我们使用元训练集 D T D_T DT进行内在评估)
(2)The original GNN models are trained by BPR loss in Eq. (2) on T r a i n T Train_T TrainT. The proposed pre-training GNN models are trained by the cosine similarity in Eq. (4) on T r a i n T Train_T TrainT. The NCF model is trained transductively to obtain the user/item embeddings on the merge dataset of T r a i n T Train_T TrainT and T e s t T ′ Test^′_T TestT′. The embeddings in both the proposed models and the GNN models are initialized by the NCF embedding results. We use Spearman correlation [16] to measure the agreement between the ground truth embedding and the predicted embedding. (所提出的模型和GNN模型中的嵌入均由NCF嵌入结果初始化。我们使用斯皮尔曼相关性[16]来衡量地面真值嵌入和预测嵌入之间的一致性。)
Recommendation In this section, we apply the pre-training GNN model into the downstream recommendation task and evaluate the performance. (建议在本节中,我们将训练前GNN模型应用到下游推荐任务中,并评估其性能。)
(1)We consider the scenario of the cold-start users and use the meta-test set DNto perform recommendation. For each user in DN, we select top 10% of his interacted items in chronological order into the training setT rainN, and leave the rest items into the test setT estN. We pre-train our model on DTand fine-tune it onT rainNaccording to Section 3.5.
(2)The original GNN and the NCF models are trained by the BPR loss function in Eq. (2) on DTandT rainN. For each user inT estN, we calculate the user’s relevance score to each of the rest 90% items. We adopt Recall@K and NDCG@K as the metrics to evaluate the items ranked by the relevance scores. By default, we set K as 20 for Ml-1m and Moocs. For Last.fm, since there are too many items, we set K as 200.
Table 3 shows the overall recommendation performance. The results indicate that the proposed basic pre-training GNN models outperform the corresponding original GNN models by 0.40%-3.50% in terms of NDCG, which demonstrates the effectiveness of the basic pre-training GNN model on the cold-start recommendation performance. Upon the basic pre-training model, adding the meta aggregator and the adaptive neighbor sampler can further improve 0.30%-6.50% NDCG respectively, which indicates the two components can indeed alleviate the impact caused by the cold-start neighbors when embedding the target users/items, thus they can improve the downstream recommendation performance. (表3显示了总体推荐性能。结果表明,所提出的基本预训练GNN模型在NDCG方面比相应的原始GNN模型的性能提高了0.40%-3.50%,这表明了基本预训练GNN模型对冷启动推荐性能的有效性。在基本预训练模型的基础上,添加元聚合器和自适应邻居采样器,可以分别进一步提高0.30%-6.50%的NDCG,这表明这两个组件确实可以缓解嵌入目标用户/项目时冷启动邻居造成的影响,因此,它们可以提高下游推荐的性能。)
Case Study. We attempt to understand how the proposed pre-training model samples the high-order neighbors of the cold-start users/items by the MOOCs dataset. Fig. 4 illustrates two sampling cases, where notation * indicates the users/items are cold-start.