2021_WWW_Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Informat

[论文阅读笔记]2021_WWW_Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Information Networks—(WWW, 2021)–Bibek Paudel, Abraham Bernstein

论文下载地址:https://dl.acm.org/doi/10.1145/3442381.3449970
发表期刊:WWW(International World Wide Web Conference Committee)
Publish time: 2021
作者单位:Stanford University, University of Zürich
数据集:作者自己从Twitter上crawl的tweets,是ideological, political 的topic
代码

  • (1)文章要多读几遍,不是方法有多难,而是这个domain,这个topic不太好懂,要耐心读一下,author到底想做件什么事

  • (2)提出了Random Walks with Erasure(RWE) 带擦除的随机漫步
    细节在 第4章

    • 与普通Random Walk最大的不同是,RWE是允许不同的概率分布,而不是由程度分布诱导的
    • 到某些节点的随机漫步遍历被系统地删除,以降低其相对于随机漫步起始节点的重要性
    • 下一次迭代时,walker从起始点开始她的行走,从上一个迭代中擦除的mass开始,而不是通常的mass 1
    • 本文自定义了相似度量similarity(在
    • 本文自定义了擦除矩阵erasure matrix Q B Q_B QB
  • (3) 在one dimension一维上ideological position (3.2用的前人的方法)

    • 自定义了bridge user等的bridge (weak tie)
  • (4) optimizatiion 采取的是 联合优化 Joint optimization

  • (5)自己爬的数据集,自己删掉了信息不丰富的数据,确保这个故事能完美地运行。

    • 自己定义了eliteweb-content
  • topic或者domain很novel,我感觉是一个很大的“创新”,毕竟作为一个Chn一般不会写这个topic的

Abstract

  • (1) 前人工作的不足:
    Most existing personalization systems promote items that match a user’s previous choices or those that are popular among similar users. This results in recommendations that are highly similar to the ones users are already exposed to, resulting in their isolation inside familiar but insulated information silos(这就导致recommendations与用户已经接触到的建议高度相似的建议,导致他们被隔离在熟悉但隔离的信息竖井中)
  • (2)we develop a novel recommendation framework with a goal of improving information diversity using a modified random walk exploration of the user-item graph
  • (3) We focus on the problem of political content recommendation, while addressing a general problem applicable to personalization tasks in other social and information networks.
  • (4) For recommending political content on social networks, we first propose a new model to estimate the ideological positions for both users and the content they share, which is able to recover ideological positions with high accuracy.
  • (5) 数据
    large datasets of Twitter discussions
  • (6) we show that our method based on random walks with erasure is able to generate more ideologically diverse recommendations.
  • (7) Our approach does not depend on the availability of labels regarding the bias of users or content producers.(不依赖与标签)

CCS Concepts

Computing methodologies→ Machine learning; Information systems→ Recommender systems; Social networks.

Keywords

  • diverse recommendations
  • social networks
  • random walks.

Introduction

  • (1) Online Social Networks(OSNs) 在线社交网络
  • (2) High-quality as well as balanced news consumption is vital for a functioning democracy (当前的需求,自己为什么要做这个工作)
  • (3) this paper introduces recommender algorithms that are designed with the goal of increasing the reader’s exposure to diverse information.
  • (4) To promote the diversity of views, we propose a random-walk based algorithm that can generate diverse as well as accurate recommendations. We introduce a modified random-walk exploration of the user-item feedback graph in which random-walk traversals to certain nodes are systematically erased in order to lower their importance with respect to the starting node of the random walk.(到某些节点的随机漫步遍历被系统地删除,以降低其相对于随机漫步起始节点的重要性)
  • (5) Our approach based on this modified random walk exploration provides a general mathematical and algorithmic framework for iversifying recommendations that can be used in various domains.
  • (6) We exploit the sharing behaviour of users on social media related to particular political events in order to estimate their ideological positions on a one-dimensional scale.(我们利用用户在社交媒体上与特定政治事件相关的分享行为,以便在一维尺度上估计他们的意识形态立场。) Based on such information, our recommendation approach can suggest news items to users that purposefully exposes them to different viewpoints and increases the diversity of their information “diet. ”(基于这些信息,我们的推荐方法可以向用户推荐新闻项目,有意地让他们接触到不同的观点,增加他们的信息“饮食”的多样性)
  • (7) A common way to diversify content is by including viewpoints from different outlets, assuming that ideological positions of political elites and news outlets are fixed over long durations. In highly contested political events, however, this approach is likely to suffer from a major problem: a set of viewpoints from politicians or news sources belonging to different ideologies can still be homogeneous.
  • (8) Table 1 shows two examples of how content from the same outlet were shared by groups of people with opposing political viewpoints about the 2016 Brexit referendum in the UK.
    2021_WWW_Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Informat_第1张图片
    • Telegraph is known as a conservative British newspaper2 that supported the Leave campaign,3 its report about the backtracking of a campaign promise by Leave campaigners (first example in the bottom row, Table 1) was popular among the supporters of the Remain campaign, while other pieces were popular among the supporters of the Leave campaign (second example in the bottom row, Table 1)(《每日电讯报》作为一份支持脱欧运动的英国保守报纸而闻名,它关于脱欧运动人士背弃竞选承诺的报道(表1底部第一个例子)在留欧运动的支持者中很受欢迎。而其他作品在脱欧阵营的支持者中很受欢迎(表1下一行的第二个例子))
    • The two opposing groups also shared articles from the BBC (top row in Table 1) differently: the first article was shared more by the Remain supporters and the second article was shared more by Leave supporters.
  • (9) As a first step towards tackling this problem, we propose a novel approach to recommendation diversification that incorporates ideological positions about particular political events learned from social media signals.(该方法结合了从社交媒体信号中了解到的关于特定政治事件的意识形态立场。)
    To the best of our knowledge this is the first work to deal with this problem.
    Our proposed solution has two components:
      (i) learning ideological positions of users, political elites, and web content as well as
      (ii) using the ideological positions to diversify recommendation based on (random walks with erasure and a diversification strategy that exploits weak ties in social networks).(利用意识形态立场,基于带有擦除的随机漫步和利用社交网络中的弱关系的多样化策略,使推荐多样化。)
  • (10) In this work, we use the one-dimensional ideological positions (left-right) for users and political content. The== key difference from other approaches== is that we identify such positions for political elites, users, and individual content (rather than content outlets) depending on the sharing patterns on social networks during specific political events. Additionally, we also propose a novel and effective recommendation strategy based on ideological positions.(我们根据特定政治事件期间社交网络上的分享模式,为政治精英、用户和个人内容(而不是内容渠道)确定这些立场。此外,我们还提出了一种新颖而有效的基于思想立场的推荐策略)
  • (11) We find that RWE can recommend both highly accurate and diverse items to the users.
In summary, our contributions in this paper are the following:
  • (i) we describe a new method to estimate ideological positions of not only users and elites, but also web content shared on social networks such as Twitter,
  • (ii) we introduce random walk with erasure (RWE), a novel modified random walk based exploration of bi-partite feedback graphs that is useful for diversifying recommendations
  • (iii) on datasets of social media discussions we show experimentally that our recommendation method based on RWE is able to diversify political content recommendations
  • (iv) on open benchmark datasets from other domains, we show experimentally that our algorithm can provide a general framework for diversified recommendations

2 Related Work

2.1 Recommendation Diversity

  • (1) previous works
    They diversify recommendations by exploiting topics and tags, post-processing of recommendations, promoting long-tail items, and so on. As a result, each of the proposed approaches provides a different kind of diversity to the users

2.2 Ideology Detection from social Media

前人:

  • (1) Methods that use so-cial media behavior to estimate political leanings of users [9, 12, 13] can be compared to the multidimensional scaling method famously known as DW-NOMINATE [44], that measures ideology of parliamentarians by analyzing legislative voting behavior
  • (2) Some recent works approach the problem of recommendation diversity using ideological positions [5, 18, 29, 32, 36], but they either rely on outlet specific positions, or do not provide a complete recommendation framework

本文:

  • (3) We not only address the problem with outlet-specific positions, but also provide an end-to-end recommendation framework, with extensive evaluations with state-of-the-art methods.

2.3 Political Content Diversity

问题:

  • (1) In context of political content, there are additional challenges regarding the question of recommendation diversity. Exposure to diverse viewpoints, and cross-cutting discussions between users of different viewpoints may help widen their perspective and can be desirable for a healthy democracy
  • (2) However, it is not enough to just diversify information without regard for several factors that influence opinion formation. Research has shown that exposure to diverse political viewpoints can also lead to further polarization [7], especially in case of individuals who hold a strong viewpoint on a particular side of the debate [49] (研究表明,接触不同的政治观点也可能导致进一步的两极分化,特别是当个人对辩论的某一方持有强烈观点时)

2.4 Weak Ties

(1) There is also evidence from social network theory that weak ties are important for exposure to diverse informatios

2.5 Political Polarization on Social Networks

  • (1) A study by Facebook[8] demonstrates how algorithmic filtering affects users’ exposure to news in OSNs
  • (2) The challenge of diversifying recommendations can be seen as part of the research on AI and machine-learning biases [4, 22,39, 52].

2.6 Scholars have argued that exposure to diverse viewpoints

  • (1) They are deemed essential for promoting political tolerance and deliberative democracy.
  • (2) This implies that greater network diversity reduces polarization by facilitating cross-cutting discussions.

3 Preliminaries

3.1 Random Walks on the Feedback Graph

  • (1) The user-item feedback dataset is a m × n m\times n m×n matrix A \pmb{A} AAA
    In our case, we use implicit feedback(隐式的反馈)
  • (2) We model G G G as unweighted and undirected graph (all edges have the same weight) (无权图)
  • (3) but we could also generalize the definitions to a weighted version. (但是也可以将定义推广至一个加权的版本)
  • (4) The adjacency matrix of the bi-partite user-item graph has the dimension ( m + n ) × ( m + n ) (m+n) \times (m+n) (m+n)×(m+n) and is constructed as shown in
    在这里插入图片描述
    The transition-probability matrix P P P for AG is obtained by row-normalizing its entries:
    在这里插入图片描述
  • (5) D D D is the degree matrix which has the degree of the nodes of the graph in its diagonal elements.
  • (6) Its entries P i j P_{ij} Pij encode the probability of a random-walk starting at node i i i arriving at node j j j in one step.
    Every odd-power of P (e.g., P 3 P^3 P3) represents the transition probabilities for random walks starting at one of the user vertices and arriving at one of the item vertices.(P的每一个奇次幂(例如, P 3 P^3 P3)表示从一个用户顶点开始并到达一个项目顶点的随机漫步的转移概率。)

3.2 One-dimensional Ideological Positions

前人:

  • (1) A seminal work about the estimation of ideological positions is by Poole and Rosenthal [44], who used roll-call data from the United States Congress to recover the political positions of its members, called their ideal points. These approaches place the politicians on a latent dimension, which is usually a point in the one-dimensional left-right scale.
  • (2) Using this ideological dimension, a politician can be said to be left- or right-wing depending on whether her estimated position is towards the left or right of the center.(从这个意识形态的维度来看,一个政治家可以被称为左翼或右翼,这取决于她的估计立场是偏向中间的左翼还是右翼)

本文:

  • (3) In this work, we estimate ideological positions for not only political elites, but also common users and the political content (URLs) they share on social media
  • (4) We denote the ideal points of user u u u, elite u u u, and content i i i by θ u \theta_u θu, ϕ e \phi_e ϕe , and ψ i \psi_i ψi
  • (5) For example, user u p u_p up and URL i q i_q iq can be said to share similar political stance if their ideological positions are nearby, i.e. ∣ θ p − ψ q ∣ |\theta_p - \psi_q| θpψq is small.

4 Random Walks with Erasure

  • (1) At certain steps in the random walk, erasures cause a fraction of the mass reaching the destination vertices to be erased and sent back to the origin vertex(在随机行走的某些步骤中,擦除会导致到达目标顶点的质量的一小部分被擦除并发送回原点顶点)
    In other words, for a vertex that receives a mass of p p p from a random walk, a portion p × q p \times q p×q is erased, where 0 ≤ q < 1 0\leq q < 1 0q<1 is the amount of erasure.
  • (2) The remaining mass stays at the vertex and the erased mass is sent back to the vertex from which the random walk started.(剩余的质量保留在顶点,已擦除的质量将发送回随机行走开始的顶点)
  • (3) We can express this in probabilistic terms: erasure probability q q q defines the probability with which the walk reaching a destination vertex is erased and sent back to the origin vertex.(擦除概率q定义了到达目的地顶点的路径被擦除并返回到起点顶点的概率。)
  • (4) At the next iteration, instead of the usual mass of 1, the walker starting at the origin vertex starts her walk with the mass accumulated from the erasures in the previous iteration(下一次迭代时,步行者从起始点开始她的行走,从上一个迭代中擦除的质量开始,而不是通常的质量1)
  • (5) It is important to note that at each iteration the initial mass in the starting vertex gets smaller and is always less than 1(重要的是要注意,在每次迭代中,初始顶点的初始质量变得更小,总是小于1)
  • (6) In this way, RWE induces different random walk transition probabilities than the usual random walk.(通过这种方法,RWE得到了不同于一般随机游动的随机游动转移概率。)
  • (7) The intuition behind RWE is to allow different probability distributions than those induced by the degree distribution of the graph.(RWE背后的直觉是允许不同的概率分布,而不是由程度分布诱导的)
  • (8) This provides the flexibility to favor certain nodes during the random-walk exploration, based on their attributes, or similarity with the origin vertex.(这提供了在随机遍历探索期间根据某些节点的属性或与原始顶点的相似性来选择它们的灵活性)
  • (9) We exploit this property of RWE to diversify recommendations by proposing two different strategies described below.

4.1 Formal Definition

  • (1) RWE proceeds like a regular random walk except for two important differences.
first
second
difference
involves a erasure-matrix Q which encodes the node-specific erasure probabilities
the erasure process itself
first
second
difference
involves a erasure-matrix Q which encodes the node-specific erasure probabilities
the erasure process itself
  • (2) The amount by which the walks arriving a vertex are erased is not the same for all vertices—they differ for each pair of random walk origin and current vertex.(到达某个顶点的遍历被擦除的数量对所有顶点来说都是不一样的——它们对每对随机遍历原点和当前顶点来说都是不同的)
  • These quantities are encoded in Q ∈ [ 0 , 1 ) ( m + n ) × ( m + n ) Q\in[0, 1)^{(m+n)\times(m+n)} Q[0,1)(m+n)×(m+n). The entries Q i j Q_{ij} Qij indicate the erasure probabilities from destination vertex j j j to the origin vertex i i i.

(3)the distinction between PageRank and RWE:

  • while the restart probability and the erasure probability in the two method seem similar, the crucial difference is that the erasure probability is different for each pair of vertices and the number of walks that get erased varies in each iteration, until convergence.(虽然这两种方法的重启概率和擦除概率看起来相似,但关键的区别在于,每对顶点的擦除概率不同,每次迭代中被擦除的行走次数也不同,直到收敛。)
  • (4) At each iteration of the random-walk, RWE proceeds as follows:
    • (a) start regular random walks of odd number of steps k k k from origin vertex s s s, and as specified by the transition probabilities P P P in (2)
    • (b) at the destination vertex j j j, with probability Q i j Q_{ij} Qij , erase the walk; with probability 1 − Q i j 1-Q_{ij} 1Qij , do not erase the walk, (在目的地顶点j处,以概率 Q i j Q_{ij} Qij擦除行走)
    • (c ) at the second iteration, start new random walks from the origin vertex with the following probability
      在这里插入图片描述
      • (i) Here 1 \pmb{1} 111 is a m + n m + n m+n dimensional vector with all ones, ∘ \circ is the Hadamard product, and P k ∘ Q \pmb{P}_k \circ \mathcal{Q} PPPkQ encodes erasures from the previous iteration.
      • (ii) Multiplication with 1 sums up all the erasures arriving at each origin-vertex(与1的乘法将到达每个起始点的所有擦除相加)
      • (iii) Considering s s s as the origin user-vertex, I . , s \pmb{I}_{.,s} III.,s is the s t h s^{th} sth column of (m + n) × (m + n) dimensional identity matrix and the final Hadamard product gives the initial state probability modified due to erasures(最终的Hadamard乘积给出了经过擦除修改的初始状态概率)
  • (5) This This process is continued for sufficient number of iterations and finally the number of walks at the destination vertices that were not erased is used to estimate the probability of a k k k-step RWE starting at i i i and reaching j j j without being erased. We use this probability to score item-nodes for recommendation tasks.(这个过程继续进行足够多的迭代,最后,在目标顶点上没有被擦除的行走次数被用来估计从 i i i开始到 j j j k k k步RWE不被擦除的概率。我们使用这个概率为推荐任务的项目节点打分。)

5 Diversification Strategies

  • (1) The Erasure matrix Q Q Q can be defined by the service providers according to their strategy for diversifying recommendations. (Erasure矩阵 Q Q Q可以由服务提供商根据其多样化建议的策略来定义)
    In other words, the strategy determined by Q Q Q can be defined to favor diverse items that would be less traversed by regular k k k-hop random walks.(换句话说,由Q决定的策略可以被定义为偏向于那些被常规 k k k-hop随机游走次数较少的不同项目)
  • (2) Note that the items in the final recommendation list are those in the local neighborhood (when k k k is not large) of the user vertices;(最终推荐列表中的项目是用户顶点的局部邻域(当k不是很大时))
    only the probability of traversal to those vertices are changed due to Q. RWE diversifies the recommendations by promoting diverse items connected by weak links, (只有穿过这些顶点的概率发生了变化, 因为Q.RWE 通过促进由弱链接连接的不同项目,使建议多样化)
    and is less likely to recommend items that are too dissimilar or unfamiliar to the users

5.1 Long-tail diversity(RWE- D D D)

To use RWE for promoting long-tail diversity, as in R P β 3 RP^3_{\beta} RPβ3 [40], one can can define erasure matrix Q D Q^D QD as given in (4), where D D D defined in (2) is the diagonal matrix containing vertex degrees, and β \beta β is a parameter that can be used to tune the erasure probabilities. This strategy depends only on the degree of item vertices and has the effect of preferring low-degree (long-tail) items.
在这里插入图片描述

5.2 Bridging political viewpoints (RWE- B B B)

  • (1) Those towards the left in the ideological scale (e.g., u1) are called left-leaning and those towards the right (e.g.,u2,u3,u4) are called right-leaning.
    2021_WWW_Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Informat_第2张图片
  • (2) In this work, we also identify ideological positions for political content (e.g., videos, news, social media posts, etc.) and elites. The ideological positions for user u u u, elite e e e, and content i i i are θ u \theta_u θu, ϕ e \phi_e ϕe , and ψ i \psi_i ψi, respectively.
  • (3) Based on these positions, we define similarity between a content-user or a elite-user pair as: one minus the normalized absolute difference in their ideologiacal positions (:1减去他们意识形态位置的归一化绝对差异)

    s i m ( i , u ) = 1 − ( ∣ ϕ i − θ u ∣ ) / ( m a x p − m i n p ) sim(i, u)=1-(|\phi_i -\theta_u|)/(max_p - min_p) sim(i,u)=1(ϕiθu)/(maxpminp) for content i i i and user u u u
    and s i m ( e , u ) = 1 − ( ∣ ϕ e − θ u ∣ ) / ( m a x p − m i n p ) sim(e,u) = 1 - (|\phi_e - \theta_u|) / (max_p - min_p) sim(e,u)=1(ϕeθu)/(maxpminp) for elite e e e and user u u u

    where [ m a x p , m i n p ] [max_p, min_p] [maxp,minp] is the range of (all) ideological positions. It is symmetric and bounded between [0, 1].
  • (4) Viewpoints that are different but not too far from the user’s own ideological position—i.e., connected via weak links—can be expected to appeal more to the user than those that are at a greater distance. Such different viewpoints are likely to be reachable through others who are close to the user but in the opposite side of the political spectrum. We call them bridge users,
    • in Figure 1, u2 could be a bridge user between those on her right (u3 and u4) and those on her left (u1)
    • Bridges are weak ties whose ideological positions are on the opposite side of the user’s own position.(桥梁是一种弱纽带,其意识形态立场与用户自身立场相反)
    • Similar notions apply in case of elites and content (其他对象的相似度也一样用此定义)
  • (5)Based on these motivations, we present the RWE strategy for bridging diverse political viewpoints and define the corresponding erasure matrix Q B Q^B QB for user-content pairs as:
    在这里插入图片描述
    • where the values in Q Q Q are less than 1, and the parameter ϵ < 1 \epsilon < 1 ϵ<1 is determined by the service provider.
    • A high ϵ \epsilon ϵ causes random walks reaching non-bridge elites or content to be erased at a higher rate. (一个高的 ϵ \epsilon ϵ会导致到达非桥上精英的随机步行或内容以更高的几率被抹去)
    • The erasure matrix QBu,e for user-elite pairs is defined similarly.

6 Political Ideology Detection

如何找到diversifying recommendations 的candidates

  • (1) For this purpose, we consider two user-item feedback graphs: the elite-endorsement graph and the content-share graph.
    we consider two user-item feedback graphs
    elite-endorsement graph
    content-share graph
    • there are m m m users U \mathcal{U} U and n e n_e ne elites E \mathcal{E} E
    • the same users U \mathcal{U} U and n i n_i ni content-identifiers I \mathcal{I} I constitute the content-share graph
  • (2) We treat retweets and content-sharing as acts of endorsements of elites and content by users with similar ideological positions.(我们将转发和内容分享视为精英人士的认可,以及具有类似意识形态立场的用户对内容的认可。)
  • (3) We consider any URL present in the tweets as a web-content and these URLs could refer to news, videos, pictures, or other social media posts.(我们认为tweets中出现的任何URL都是网络内容,这些URL可以是指新闻、视频、图片或其他社交媒体帖子。)
    Similarly, we consider a retweet as an elite-endorsement.(同样,我们认为转发是对精英的认可)
  • (4) From these feedback graphs, we can construct two matrices similar to the feedback matrices defined in Section 3.1:
    • R \mathcal{R} R of dimension m × n e m\times n_e m×ne for the elite-endorsement graph
    • and S \mathcal{S} S of dimension m × n i m\times n_i m×ni for the content-share graph.
    • The entries R u , e R_{u, e} Ru,e are 1 if user u u u has retweeted the elite e e e
    • and likewise entries in S u , i S_{u,i} Su,i are 1 if user u u u has shared the content i i i.
    • The remaining entries are zero.

6.1 Using the elite-endorsement graph

  • (1) We assume a one-dimensional ideological space and want to recover the ideal points θ ∈ R \theta \in \mathcal{R} θR for users and ϕ ∈ R \phi \in \mathcal{R} ϕR for elites in this space
  • (2) we assume the distance between them in this space to be low, and model these as quadratic utility functions similar to [9, 11]. With this assumption, R u , e = 1 R_{u, e} = 1 Ru,e=1 indicates that distance between u u u and e e e is small in this space
  • (3) We model this in probabilistic terms, and state the probability of the user endorsing an elite using the logistic function:(用户认可精英的概率)
    在这里插入图片描述
    where the terms α u \alpha_u αu and β e \beta_e βe are bias terms associated with u u u and e e e, and account for the differences among users and elites respectively.
  • (4) Using Bayesian inference, Bernoulli probability mass function, and under the assumption that all observed endorsements R u e R_{ue} Rue are independent, we get the following:
    2021_WWW_Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Informat_第3张图片
    hte parameter a u , e a_{u,e} au,e is used to assign confidence to the observed endorsement of e e e by u u u, and it could be a function of the number of times u u u has endorsed e e e.
  • (5) To simplify the notation, we write Π u e = − ∣ θ u − ϕ e ∣ 2 + α u + β e \Pi_{ue} = -|\theta_u - \phi_e|^2 + \alpha_u + \beta_e Πue=θuϕe2+αu+βe.
    After placing standard normal priors on θ \theta θ and ϕ \phi ϕ, and taking the log of posterior, we get the following log-likelihood function with L − 2 L-2 L2 regularization terms:
    2021_WWW_Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Informat_第4张图片

6.2 Using the elite-endorsement and content-share graph

  • (1) We assume that webcontent shared by users have ideological positions ϕ ∈ R \phi \in \pmb{R} ϕRRR in the same shared latent space described in Section 6.1
  • (2) In case of content, S i , k = 1 \pmb{S}_{i,k} = 1 SSSi,k=1 indicates that the distance between the ideological positions of u i u_i ui and i k i_k ik is small in this space.
  • (3) We also model the probability of a user sharing a web-content using logistic function:
    在这里插入图片描述
    where λ i \lambda_i λi is the bias term associated with i i i
  • (4) As before, using Bayesian inference and the assumption that all observed s u , i ′ {\pmb{s}_{u,i}} ^{'} sssu,is are independent, we arrive at the following expression:
    2021_WWW_Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Informat_第5张图片
    the parameter b u , i b_{u,i} bu,i is used to assign confidenceto the observed endorsement of i i i by u u u, for example the number of endorsement.

optimization

  • (1) Instead of learning the ideologies in (6) and (9) separately, we formulate a joint optimization to learn all the ideological positions together.6

    • The reason for doing so is to align the positions learned by (6) and (9).
    • When there is not enough observed data in R \pmb{R} RRR or S \pmb{S} SSS, one model is also expected to compensate for the lack of data in the other when learning θ s ′ \theta^{'}_s θs, ϕ s ′ \phi^{'}_s ϕs and ψ s ′ \psi^{'}_s ψs jointly
  • (2)In other words, we use these two models to regularize each other such that they share the same latent dimension and learn relative distances between them in that shared space.
    2021_WWW_Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Informat_第6张图片

  • (3) After placing standard normal priors on ψ \psi ψ as before, and adding contribution from (10) as an additional regularizer on (8), we get the following log-likelihood function for joint optimization, where μ \mu μ trades-off the contribution from elite-endorsement graph:
    2021_WWW_Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Informat_第7张图片
    The local maxima of (8) and (11) can be found via a gradient-based optimization in which all but one parameter are fixed at each step and they are updated alternatively.

7 Experiments

7.1 Dataset Collection and Properties

  • (1) we crawled tweets during three political events and created these datasets:
    2021_WWW_Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Informat_第8张图片

2021_WWW_Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Informat_第9张图片

2021_WWW_Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Informat_第10张图片

  • (2) For UK2016 and US2016 datasets, we gathered roughly equal number of tweets for each major campaign position (Remain and Leave) or presidential candidate (Donald Trump and Hillary Clinton). For DE2017 dataset, we included search terms representing each major political party and some general terms related to the election.
included
included
3 political events
UK2016 from the 2016 EU referendum in the UK
Remain
Leave
US2016 from the 2016 US presidential elections
Donald Trump
Hillary Clinton
DE2017 from the 2017 German federal elections
search terms representing each major political party
search some terms related to the election
  • (3) To filter suspicious users and content,

    • we removed tweets that were not retweeted more than 50 times, (转发次数少于50的,就删掉)
    • and also the tweets by users who had few followers or who did not tweet often. (粉丝很少或者不经常发推文的,就删掉)
    • Note that this step may not filter out bots and automated accounts.(可能过滤不掉机器人和自动账户)
      (让数据非常干净,信息丰富,也算是手动数据清洗了)
  • (4) we created two user-item feedback graphs for each dataset:

    • (a) elite-endorsement
    • (b) content-endorsement
    user-item feedback graphs
    elite-endorsement
    content-endorsement
  • (5) We treat each Twitter user who is retweeted more than five times as an elite

  • (6) each URL that is included in more than five tweets as a web-content

  • (7) The rows of both matrices denote users and columns denote elites (in R u , e \pmb{R}_{u,e} RRRu,e ) and web-content (in S u , i \pmb{S}_{u,i} SSSu,i), respectively.

  • (8) Additionally, we evaluate RWE-D on two benchmark datasets from recommender systems: Movielens-1M and Yelp-Restaurants.
    2021_WWW_Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Informat_第11张图片

7.2 Political Ideology Detection

2021_WWW_Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Informat_第12张图片
2021_WWW_Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Informat_第13张图片

2021_WWW_Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Informat_第14张图片

2021_WWW_Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Informat_第15张图片

Result I: Our method accurately identifies ideological positions from social network signals.

7.2 Recommendation Baselines and Measures

  • (1) Among the several measures for evaluating the accuracy of a recommender system, we use the common ones:

    • AUC,
    • Mean Rank (MR),
    • Hit-rate (HR),
    • Precision(P) at top-10.

    Higher values of these measures indicate better accuracy.

  • (2) For measuring long-tail diversity, we borrow the measures

    • Gini-Diversity (GiniD@20),
    • Personalization (Pers@20),
    • Surprisal (Surp@20),
    • Average Item Degree (AvgDeg@20)
      2021_WWW_Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Informat_第16张图片

7.4 Long-Tail Diversity

Result II: RWE generates accurate and diverse long-tail recommendations.

7.5 Ideological Diversity

  • (1) In case of Twitter-based datasets for political content, we lack measures that comprehensively capture the recommendation diversity. In this section(就基于twitter的政治内容数据集而言,我们缺乏全面捕捉推荐多样性的措施)

  • (2) In this section, we use 4 methods:

    • (i) average range of ideological positions in the top-10 recommendations RecRange@10
    • (ii) visual comparison of ideological distribution of items in top-k recommendations,
    • (iii) Kolmogorov-Smirnoff statistic to quantify the difference in distributions of political ideology in top-k recommendations, and
    • (iv) new measures to numerically and visually inspect the ideological diversity for users across the spectrum.
  • (3) We also used Kolmogorov-Smirnov statistic with the null hypothesis that the distribution of political ideology in the top-10 recommendations generated by RWE-B is similar to those of baseline algorithms.

8 Conclusion, Limitations, and Future Work

conclusion

  • (1) We proposed a novel approach to diversify recommendations in social and information networks and showed that it is able to generate both long-tail and ideologically diverse recommendations.

  • (2)提出了 Random Walk with Erasure(RWE)

  • (3) For ideological diversity, our approach consists of 2 parts:

    • (i) detection of ideological positions of not just users and elites but also web-content by exploiting social media signals about important political debates(通过利用有关重要政治辩论的社交媒体信号,不仅可以探测到用户和精英的意识形态立场,还可以探测到网络内容的意识形态立场)
    • (ii) diversification of recommendations using the detected ideological positions.
  • (4) this is the first work to present a framework for political content diversification and a joint learning of ideologies positions. (这是第一个为政治内容多样化和共同学习意识形态立场提出框架的工作。)

Assumptions and Limitations(其实是存在的不足之处,本文的假设、数据和模型都比较简单和理想化)

  • (4) Our work has the following assumptions and limitations.
    • First, we assume one-dimensional ideological positions, which is a simplification of real-world political debates. (我们假定了一维的意识形态立场,这是对现实世界政治辩论的简化)
    • Second, a proper measure for assessing political content diversification is still lacking and may be the subject of debate [23]. (评估政治内容多样化的适当措施仍然缺乏,这可能是辩论的主题)
    • Third, we need to test RWE in an real-world, interactive scenario. (我们需要在真实世界的交互式场景中测试RWE)
    • Last, in the absence of bridge users, finding content that are both diverse and agreeable could be challenging. (寻找多样化和令人愉快的内容可能是一个挑战)

2021_WWW_Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Informat_第17张图片

2021_WWW_Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Informat_第18张图片
2021_WWW_Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Informat_第19张图片
2021_WWW_Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Informat_第20张图片
2021_WWW_Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Informat_第21张图片

2021_WWW_Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Informat_第22张图片

Acknowledgements

References

A Appendix

A.1 Additional Results

2021_WWW_Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Informat_第23张图片

2021_WWW_Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Informat_第24张图片
2021_WWW_Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Informat_第25张图片
2021_WWW_Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Informat_第26张图片

你可能感兴趣的:(Recommendation,人工智能,推荐系统)