论文下载地址: chrome-extension://ibllepbpahcoppkjjllbabhnigcbffpi/https://homangab.github.io/papers/metarecsys.pdf
发表期刊:AAAI
Publish time: 2022
作者及单位:
数据集: 正文中的介绍 附录中的介绍
代码:
其他:
其他人写的文章
简要概括创新点: (很理论的一篇文章)前人只考虑了W或者lr,本文将2者同时考虑
- In this paper, we propose MeLON, a meta-learning based novel online recommender update strategy that supports two-directional flexibility. (在本文中,我们提出了一种基于元学习的在线推荐更新策略,该策略支持双向灵活性。)
- It is featured with an adaptive learning rate for each parameter-interaction pair for inducing a recommender to quickly learn users’ up-to-date interest. (它的特点是每个参数交互对具有自适应学习率,以诱导推荐快速了解用户的最新兴趣。)
- The procedure of MeLON is optimized following a meta-learning approach: (通过元学习方法优化了MeLON的过程:)
- it learns how a recommender learns to generate the optimal learning rates for future updates. (它学习推荐如何为将来的更新生成最佳学习率。)
- Specifically, MeLON first enriches the meaning of each interaction based on previous interactions and identifies the role of each parameter for the interaction; (具体来说,MeLON首先根据之前的交互作用丰富每个交互作用的含义,并确定每个参数对交互作用的作用;)
- and then combines these two pieces of information to generate an adaptive learning rate. (然后将这两条信息结合起来,生成自适应学习率。)
(1) The widespread of mobile devices enables a large number of users to connect to a variety of online services, such as video streaming (Davidson et al. 2010), shopping (Linden, Smith, and Y ork 2003), and news (Gulla et al. 2017), where each user seeks only a few items out of a myriad of items in services. (移动设备的广泛使用使大量用户能够连接到各种在线服务,例如视频流(Davidson et al.2010)、购物(Linden、Smith和Y ork,2003)和新闻(Gulla et al.2017),在这些服务中,每个用户只需从众多服务项目中寻找几项。)
(2) In modern online recommender systems, fine-tuning has been widely employed to update models since it is infeasible to re-train the models from scratch whenever new user-item interactions come in. (在现代在线推荐系统中,微调被广泛用于更新模型,因为每当出现新的用户项交互时,从头开始重新训练模型是不可行的。)
(3) To cope with this challenge, previous researches have been actively studied in two directions. (为了应对这一挑战,以前的研究已经从两个方向进行了积极的研究。)
(4) These two orthogonal approaches focus on different aspects on learning: (这两种正交方法侧重于学习的不同方面:)
(5) In this paper, we propose, MeLON (Meta-Learning for ONline recommender update), a novel online recommender update strategy that supports the flexibility in both data and parameter perspectives. (在本文中,我们提出了一种新的在线推荐者更新策略,即MELLO(在线推荐者更新的元学习),它支持数据和参数方面的灵活性。)
(6) Corresponding to the three research questions, MeLON goes through the following three steps, as shown in Figure 2.
(7) The effectiveness of MeLON is extensively evaluated on two famous recommender algorithms using three real-world online service datasets in a comparison with six update strategies. In short, the results show that MeLON successfully improves the recommendation accuracy by up to 29.9% in term of HR@5. Such capability of MeLON is empowered by two-directional flexibility under learning-to-learn strategy, which is further supported by the theoretical analysis and ablation study. (通过使用三个真实的在线服务数据集,并与六种更新策略进行比较,对两种著名的推荐算法的有效性进行了广泛的评估。简而言之,结果表明,根据预测结果,MeLON成功地将推荐准确率提高了29.9%HR@5. MeLON的这种能力是通过学习策略下的双向灵活性来增强的,这一点得到了理论分析和实验研究的进一步支持。)
(1) Online recommenders build a pre-trained model using previous user-item interactions, and the pre-trained model is continuously updated in response to incoming user-item interactions. (在线推荐人使用之前的用户项交互建立预先训练的模型,并且预先训练的模型会不断更新以响应传入的用户项交互。)
(2) Then, the overall performance is derived by evaluating each recommender snapshot for a given mini-batch at each time step, (然后,通过在每个时间步对给定小批量的每个推荐者快照进行评估,得出总体性能,)
(3) The two directions—importance reweighting and meta-optimization—for online recommender updates are characterized by the construction of the learning rate matrix W W W. (在线推荐更新的重要性重估和元优化两个方向的特征是学习率矩阵 W W W的构造。)
(1) Instead of assigning the equal importance 1 / n 1/n 1/n to each user-item interaction as in Eq. (1), importance reweighting (He et al. 2016; Shu et al. 2019) assigns a different importance determined by a reweighting function ϕ I ( ⋅ ) \phi^I(·) ϕI(⋅), (重要性重新加权(He等人2016年;Shu等人2019年)没有像等式(1)中那样为每个用户项交互分配同等重要性 1 / n 1/n 1/n,而是分配了由重新加权函数 ϕ I ( ⋅ ) \phi^I(·) ϕI(⋅)确定的不同重要性,)
(2) The representative methods differ in the detail of ϕ I ( ⋅ ) \phi^I(·) ϕI(⋅), as follows: (有代表性的方法在 ϕ I ( ⋅ ) \phi^I(·) ϕI(⋅)的细节上有所不同,如下所示:)
(3) However, this scheme does not support the varying role of a parameter for different tasks. (然而,该方案不支持参数在不同任务中的不同作用。)
(1) On the other hand, meta-optimization(Ravi and Larochelle 2017; Li et al. 2017; Du et al. 2019; Zhang et al. 2020) aims at adjusting the learning rate of each recommender parameter θ t , m θ_{t,m} θt,m via a learning rate function ϕ P ( ⋅ ) \phi^P(\cdot) ϕP(⋅),
S2Meta (Du et al. 2019) exploits MetaLSTM (Ravi and Larochelle 2017) to decide how much to forget a parameter’s previous knowledge and to learn new user-item interactions via the gating mechanism of LSTM (Hochreiter and Schmidhuber 1997). (利用MetaLSTM(Ravi和Larochelle 2017)来决定忘记参数之前的知识的程度,并通过LSTM的选通机制学习新的用户项交互(Hochreiter和Schmidhuber 1997)。)
MetaSGD (Li et al. 2017) maintains one learnable parameter for each model parameter to adjust its learning rate based on the loss. (为每个模型参数维护一个可学习参数,以根据损失调整其学习率。)
SML (Zhang et al. 2020) maintains a convolutional neural network (CNN)-based meta-model with pretrained and fine-tuned parameters. It decides how much to combine the knowledge for previous interactions and that for new user-item interactions for each parameter. (维护一个基于卷积神经网络(CNN)的元模型,具有预训练和微调的参数。它决定了在多大程度上结合以前交互的知识和每个参数的新用户项交互的知识。)
Contrary to importance reweighting, this scheme does not support the varying importance of user-item interactions.
(1) Given a user-item interaction x = ( t , u , i ) x = (t, u, i) x=(t,u,i), the user interaction history of u u u is the set of items interacted with u u u before t t t, H u s e r ( x ) = { i ′ ∣ ∃ ( t ′ , u , i ′ ) ∈ X s . t . t ′ < t } \mathcal{H}_{user}(x) = \{ i^{'} | \exists (t^{'}, u, i^{'}) \in \mathcal{X} s.t. t^{'}< t \} Huser(x)={i′∣∃(t′,u,i′)∈Xs.t.t′<t},
similarly, the item interaction history of i i i is the set of users interacted with i i i before t t t, H i t e m ( x ) = { u ′ ∣ ∃ ( t ′ , u ′ , i ) ∈ X s . t . t ′ < t } \mathcal{H}_{item}(x) = \{u^{'} | \exists (t^{'}, u^{'}, i) \in \mathcal{X} s.t. t^{'} < t\} Hitem(x)={u′∣∃(t′,u′,i)∈Xs.t.t′<t}.
(2) For the bipartite graph, the users in H u s e r ( x ) \mathcal{H}_{user}(x) Huser(x) constitute the user side, and the items in H i t e m ( x ) \mathcal{H}_{item}(x) Hitem(x) constitute the item side. (对于二部图, H u s e r ( x ) \mathcal{H}_{user}(x) Huser(x)中的用户构成用户端,以及 H i t e m ( x ) \mathcal{H}_{item}(x) Hitem(x)中的项构成项目侧。)
(1) Given a user-item interaction x = ( t , u , i ) x = (t, u, i) x=(t,u,i), let e u e_u eu and e i ′ e_{i^{'}} ei′ be the embeddings of u u u and i ′ ∈ H u s e r ( x ) i^{'} \in \mathcal{H}_{user}(x) i′∈Huser(x). Then, the extended embedding of u , e ~ u u, \tilde{e}_u u,e~u, is defined as
(2) Last, the two extended embeddings, e ~ u \tilde{e}_u e~u and e ~ i \tilde{e}_i e~i, are concatenated and gone through a linear mapping to learn the relevance between the user and the item, as specified in Definition 3. As a result, the interaction representation contains not only the rich information about a user and an item but also the relevance between them. (最后是两个扩展嵌入, e ~ u \tilde{e}_u e~u 和 e ~ i \tilde{e}_i e~i , 按照定义3的规定,连接并通过线性映射来了解用户和项目之间的相关性。因此,交互表示不仅包含关于用户和项目的丰富信息,还包含它们之间的相关性。)
(1) Because it is well known that a parameter in a neural network has different relevance toward different tasks (user-item interactions in our study) (Bengio, Courville, and Vincent 2013), (因为众所周知,神经网络中的一个参数对不同的任务(我们研究中的用户-项目交互)具有不同的相关性(Bengio、Courville和Vincent 2013),)
(2) To help find a parameter role, the latent representation of a parameter is derived using three types of information: (为了帮助找到参数角色,使用三种类型的信息导出参数的潜在表示形式:)
The loss represents how much the recommender model parameterized by Θ t \Theta_t Θt has not learned that user-item interaction. (损失代表了由 Θ t \Theta_t Θt参数化的推荐模型的数量 尚未学习用户项交互。)
(3) Symmetric to the interaction representation in Definition 3, the role representation is obtained through a multi-layer perceptron (MLP), as specified in Definition 4. (与定义3中的交互表示对称,角色表示通过多层感知器(MLP) 获得,如定义4所述。)
(1) Our suggested online update strategy MeLON, ϕ 2 D \phi^{2D} ϕ2D, leaves a question of how much benefit it can bring compared with the previous two strategies ϕ I \phi^I ϕI and ϕ P \phi^P ϕP. (我们建议的在线更新策略 ϕ 2 D \phi^{2D} ϕ2D留下了一个问题,即与前两种策略 ϕ I \phi^I ϕI 和 ϕ P \phi^P ϕP相比,它能带来多少益处。)
(2) We denote the recommender parameters updated by W W W as Θ ^ \hat{\Theta} Θ^ and the optimal parameters as Θ ∗ Θ^∗ Θ∗. (我们将 W W W更新的推荐参数表示为 Θ ^ \hat{\Theta} Θ^, 最佳参数表示为 Θ ∗ Θ^∗ Θ∗)
(3) Then, a lower bound of ∥ W ∗ − W ∥ 2 \parallel W^∗− W \parallel_2 ∥W∗−W∥2 is obtained from the singular values σ \sigma σ of W ∗ W^∗ W∗, as formalized in Lemma 1.
(Eckart and Y oung 1936) Given W ∗ W^∗ W∗ with its singular value decomposition U Σ V UΣV UΣV and k ∈ { 1 , ⋅ ⋅ ⋅ , r a n k ( W ∗ ) − 1 } k \in \{1,· · · , rank(W^∗) − 1\} k∈{1,⋅⋅⋅,rank(W∗)−1}, (给出 W ∗ W^∗ W∗和它的奇异值分解)
For W I W^I WI and W P W^P WP (see Eq. (3) and Eq. (4)), r a n k ( W I ) = r a n k ( W P ) = 1 rank(W^I) = rank(W^P) = 1 rank(WI)=rank(WP)=1 holds.
For W 1 D ∈ W I , W P W^{1D} \in {W^I,W^P} W1D∈WI,WP (see Eq. (3) and Eq. (4)) and W2D(see Eq. (5)), the following inequality holds:
Lemma 1, Lemma 3, and W 1 D ∈ { W I , W P } W^{1D} \in \{W^I, W^P\} W1D∈{WI,WP} imply m i n W 1 D ∥ W ∗ − W 1 D ∥ = σ 2 min_{W^{1D}} \parallel W^∗− W^{1D} \parallel = \sigma_2 minW1D∥W∗−W1D∥=σ2.
On the other hand, r a n k ( W 2 D ) ≥ 1 rank(W^{2D}) ≥ 1 rank(W2D)≥1,(1:Specifically, by Eq. (6) and Eq. (7), r a n k ( W 2 D ) rank(W^{2D}) rank(W2D) is not necessarily one and can be greater than one.)
Thus, by Lemma 1, m i n W 2 D ∥ W ∗ − W 2 D ∥ ≤ σ 2 min_{W^{2D}} \parallel W^∗ - W^{2D} \parallel ≤ \sigma_2 minW2D∥W∗−W2D∥≤σ2, which concludes the proof, holds.
In the experiments, we empirically validate the advantage of the two-directional flexibility of W 2 D W^{2D} W2D. (在实验中,我们实证验证了 W 2 D W^{2D} W2D的双向灵活性的优势。)
Our evaluation was conducted to support the following: (我们的评估旨在支持以下内容:)
(1) We used two widely-used evaluation metrics, (我们使用了两个广泛使用的评估指标)
(2) Given a recommendation list, (给出一份推荐列表)
(3) The two metrics were calculated for top@5, top@10, and top@20 items, respectively.
(4) For each mini-batch on prequential evaluation, a recommender estimates the rank of each user’s 1 interacted item and randomly-sampled 99 non-interacted items, and this technique is widely used in the literature (He et al. 2016; Du et al. 2019) because it is time-consuming to rank all non-interacted items. (对于后续评估中的每个小批量,推荐人会估计每个用户的1个交互项目和随机抽样的99个非交互项目的排名,这种技术在文献中被广泛使用(He等人2016;Du等人2019),因为对所有非交互项目进行排名非常耗时。)
Please see Section B of the supplementary material for more details of the experiment settings. (有关实验设置的更多细节,请参见补充材料B部分。)
(1) We provide interesting observations for the online update strategies: (我们为在线更新策略提供了有趣的观察结果:)
(2) Specifically, in terms of HR@20, an importance reweighting strategy, MWNet, enhances the recommendation performance in Yelp, but shows worse performance in Adressa than Default. (具体来说HR@20, MWNet是一种重要的重估策略,它提高了Yelp中的推荐性能,但在Adressa中的性能比默认值差。)
(4) Thus, we conjecture that, for time-sensitive user interest, such as news in Adressa, it is more important to focus on the parameter roles, which could be associated with the topics in this dataset. (因此,我们推测,对于时间敏感的用户兴趣,例如Adressa中的新闻,更重要的是关注参数角色,这可能与该数据集中的主题相关。)
(1) While an online recommender can be trained on new user-item interactions with adaptive learning rates by the meta-model MeLON, the optimality condition varies with time. (虽然在线推荐可以通过元模型接受新用户项目交互的训练,并具有自适应学习率,但最优性条件随时间而变化。)
1. Recommender model preliminary update: (推荐模型初步更新:)
2. Meta-model update: (元模型更新:)
3. Recommender model update:
(2) Note that MeLON selectively performs the update of recommender parameters involved in the recommender’s computation for each interaction. Therefore, the required update time for MeLON is comparable to other update strategies, such as SML and S2Meta, as empirically confirmed in the evaluation results. (请注意,对于每一次交互,MeLON都会选择性地更新推荐计算中涉及的推荐参数。因此,MeLON所需的更新时间与其他更新策略(如SML和S2Meta)相当,这在评估结果中得到了实证证实。)
(3) The online training procedure of MeLON is described in Algorithm 1. When a recommender is deployed online, the algorithm conducts the three steps for every new incoming mini-batch of user-item interactions: (算法1描述了MeLON的在线训练过程。当在线部署推荐程序时,该算法会对每个新传入的小批量用户项交互执行三个步骤:)
(4)Before a recommender is deployed online, both the recommender and the meta-model are typically pre-trained on the past user-item interactions in an offline manner. Differently from the online training, we first randomly sample a mini-batch B of user-item interactions to derive the interactions Blast. Then, in each iteration, the recommender and the meta-model are updated in the same way as in the online learning. The model is trained for a fixed number of epochs, 100 in our experiments. Once the offline training completes, we can deploy the recommender and the meta-model in the online recommendation environment. (在在线部署推荐程序之前,推荐程序和元模型通常都是以离线方式对过去的用户项交互进行预训练的。与在线培训不同的是,我们首先随机抽取一小批用户项交互,以获得交互效果。然后,在每次迭代中,推荐者和元模型都会以与在线学习相同的方式进行更新。该模型是针对固定数量的epoch进行训练的,在我们的实验中为100个。一旦离线训练完成,我们就可以在在线推荐环境中部署推荐者和元模型。)
Four reproducibility, the source code of MeLON as well as the datasets are provided as the supplementary material. (作为补充材料,提供了四种再现性、MeLON的源代码以及数据集。)
The explicit user ratings in the Yelp dataset and three Amazon datasets are converted into implicit ones, following the relevant researches (Koren 2008; He et al. 2017); that is, if a user rated an item, then the rating is considered as a positive user-item interaction. For requential evaluation on online recommendation scenarios, we follow a commonly-used approach (He et al. 2016); (根据相关研究,将Yelp数据集和三个亚马逊数据集中的显式用户评分转换为隐式用户评分(Koren 2008;He等人,2017);也就是说,如果用户对某个项目进行了评分,那么该评分将被视为积极的用户项目交互。对于在线推荐场景的频繁评估,我们采用了一种常用的方法(He等人,2016);)
we sort the interactions in the dataset in chronological order, and divide them into three parts—offline pre-training data, online validation data, and online test data. (我们按照时间顺序对数据集中的交互进行排序,并将其分为三部分:离线预训练数据、在线验证数据和在线测试数据)
Online validation data is exploited to search the hyperparameter setting of the recommenders and update strategies and takes up 10% of test data. Because user-item interactions are very sparse, we preprocess the datasets, following the previous approaches (He et al. 2017; Zhang et al. 2020); for all datasets, users and items involved with less than 20 interactions are filtered out. (利用在线验证数据搜索推荐者的超参数设置和更新策略,占测试数据的10%。由于用户项交互非常稀疏,我们按照之前的方法(He等人2017;Zhang等人2020)对数据集进行预处理;对于所有数据集,涉及少于20次交互的用户和项目都会被过滤掉。)
Table 5 summarizes the profiles of the five datasets used in the experiments, where the details are as follows. (表5总结了实验中使用的五个数据集的概况,详情如下。)
Adressa news dataset (Gulla et al. 2017) contains user interactions with news articles for one week. We use the first 95% of data as offline pre-training data, the next 0.5% as online validation data, and the last 4.5% as online test data. (新闻数据集(Gulla et al.2017)包含一周内用户与新闻文章的互动。我们使用前95%的数据作为离线预训练数据,接下来的0.5%作为在线验证数据,最后4.5%作为在线测试数据。)
Amazon review dataset (Ni, Li, and McAuley 2019) contains user reviews for the products purchased in Amazon. Among various categories, we adopt three frequently-used categories,
Yelp review dataset contains user reviews for venues, such as bars, cafes, and restaurants. We use the first 95% of data as offline pre-training data, the next 0.5% as online validation data, and the last 4.5% as online test data. (review dataset包含对酒吧、咖啡馆和餐厅等场所的用户评论。我们使用前95%的数据作为离线预训练数据,接下来的0.5%作为在线验证数据,最后4.5%作为在线测试数据。)
For online recommenders, we use two famous personalized recommender algorithms: BPR (Koren, Bell, and V olinsky 2009; Rendle et al. 2009) and NCF (He et al. 2017). (对于在线推荐,我们使用了两种著名的个性化推荐算法:BPR(Koren、Bell和V olinsky,2009;Rendle等人,2009)和NCF(He等人,2017)。)
To train these recommender algorithms based on implicit feedback data, we employ a ranking loss(Rendle et al. 2009); for a positive item in a user-item interaction, we randomly sample another negative item that the user has not interacted before, and train a recommender algorithm to prioritize the positive item over the negative item. (为了基于隐式反馈数据训练这些推荐算法,我们采用了排名损失(Rendle等人,2009);对于用户项交互中的一个积极项,我们随机抽取另一个用户以前没有交互过的消极项,并训练一个推荐算法,将积极项优先于消极项。)
For fair comparison, we follow the optimal hyperparameter settings of the baselines as reported in the original papers, and optimize uncharted ones using HR@5 on validation data. All the experiments are performed with a batch size 256 and trained for 100 epochs. The number of updates on the default update strategy is fixed to be 1 to align with other compared strategies. The experiments are repeated 5
times varying random seeds, and we report the average as well as the standard error. For the graph attention in the first component of MeLON, we randomly sample 10 neighbors per target user or item. Besides, for the MLP which learns the parameter roles, the number of hidden layers (L) is set to be 2. To optimize a recommender under the default and sample reweighting strategies, we use Adam (Kingma and Ba 2015) with a learning rate η = 0.001 and a weight decay 0.001. Note that a recommender trained with the meta-optimization strategies is optimized by a meta-model, while the meta-model is optimized with Adam during the meta-update in Eqs. (14) and (15). Our implementation is written in PyTorch, and the experiments were conducted on Nvidia Titan RTX. (为了公平比较,我们遵循原始论文中报告的基线的最佳超参数设置,并使用HR@5关于验证数据。所有的实验都是以256的批量进行的,并经过100个时代的训练。默认更新策略的更新次数固定为1,以与其他比较策略保持一致。实验重复5次时间变化的随机种子,我们报告平均值和标准误差。对于MeLON第一个成分中的图形注意,我们随机抽取每个目标用户或项目的10个邻居。此外,对于学习参数角色的MLP,隐藏层的数量(L)设置为2。为了在默认和样本重新加权策略下优化推荐,我们使用了学习率η=0.001、权重衰减为0.001的Adam(Kingma和Ba 2015)。请注意,使用元优化策略训练的推荐人通过元模型进行优化,而元模型在Eqs中的元更新期间使用Adam进行优化。(14) 和(15)。我们的实现是用Python编写的,实验是在Nvidia Titan RTX上进行的。)