论文下载地址: https://doi.org/10.1145/3366423.3380020
发表期刊:WWW
Publish time: 2020
作者及单位:
数据集: 正文中的介绍
代码:
其他:
其他人写的文章
简要概括创新点: 这是一篇偏理论分析的文章,用的weighted-KNN,然后就分析分析分析
- (1)We show three novel results that apply both to offline advice taking and online recommender settings. (我们展示了三个新的结果,它们同时适用于离线咨询和在线推荐设置)
- First, influential individuals have mainstream tastes and high dispersion in their taste similarity with others. (首先,有影响力的人有主流口味,他们与他人的口味相似性高度分散。)
- Second, the fewer people an individual or algorithm consults (i.e., the lower k is) or the larger the weight placed on the opinions of more similar others, the smaller the group of people with substantial influence. (第二,个人或算法咨询的人越少(即k越低),或者对更相似的人的意见的权重越大,具有重大影响的群体就越小。)
- Third, the influence networks emerging from deploying the k-nn algorithm are hierarchically organized. (第三,部署k-nn算法产生的影响网络是分层组织的。)
- 细节
- user之间的相似性用的皮尔逊相关系数
- We used node strength, defined as the sum of the absolute weights2assigned to each of the k k k nearest neighbors, as a measure of social influence that naturally fits the weighted k − n n k-nn k−nn algorithm and weighted networks more generally [4]. 我们使用节点强度(定义为分配给每个 k k k近邻的绝对权重之和)作为衡量社会影响力的指标,自然符合加权 k − n n k−nn k−nn算法和更一般的加权网络[4]。这种通用方法也可用于在降维空间[34]上计算用户之间相似性的算法,或在使用关于个体的其他可观察信息来计算他们之间的相似性[21]时使用。)
- In this case, node strength reduces to in-degree, the arguably most basic centrality measure. In our setting, in-degree represents the number of times a node (person) was sought for advice (or involved in the calculation of a recommendation). The analysis shows that in-degree varies greatly across people: For a wide range of values of k k k there are only a few influential individuals (hubs; see Figure 3). (在这种情况下,节点强度降低到可以说是最基本的中心性度量。在我们的设置中,in-degree表示节点(人员)寻求建议(或参与建议计算)的次数。分析表明,不同的人在程度上差异很大:对于 k k k值的广泛范围,只有少数有影响力的人(枢纽;见图3)。)
- A second metric, the local clustering coefficient— which measures the extent to which an individual’s advisers also advise each other—is inversely related to the in-degree following the power law C ( d ) = d − β C(d) = d^{−β} C(d)=d−β: the less influence individuals exert over others, the tighter the clusters they tend to form (第二个指标是局部聚集系数,它衡量个人顾问之间相互建议的程度,与幂律 C ( d ) = d − β C(d) = d^{-\beta} C(d)=d−β的程度成反比:个人对他人施加的影响越小,他们倾向于形成的集群就越紧密
social influence, influencers, social networks, collaborative filtering
(1)We all have opinions on matters of taste. Whether it is a new song, the design of a building, or the performance of an actor, people are eager to express their opinions offline and online. However, the opinions of some are sought out and appreciated more than the opinions of others. Consider renowned film critics such as Roger Ebert or wine critics like Robert Parker: their opinions are recognized as an indicator of quality by most other critics and the general public alike—and can thus affect the price or financial success of a product [1, 5]. Relative to such highly influential individuals, most people exert little social influence over others. (对于品味问题,我们都有自己的看法。无论是新歌、建筑设计还是演员表演,人们都渴望在线下和网上表达自己的观点。然而,一些人的意见比其他人的意见更容易被寻求和欣赏。想想著名的影评家,比如Roger Ebert或像Robert Parker这样的葡萄酒评论家,他们的意见被大多数评论家和公众认为是质量的指示器,因此可以影响产品的价格或财务上的成功[ 1, 5 ]。与这些有影响力的人相比,大多数人对他人的社会影响力很小。)
(2)Sociologists and communication scientists have been interested in the study of influential individuals since the mid-20th century, and understandably so. By accurately identifying individuals with influence, policy makers can sway public opinion on critical matters such as public health and the diffusion of socially beneficial innovations. Early studies [27, 33, 55] surveyed large numbers of people, typically residents of representative mid-sized cities in the United States, and asked them whom they would consult for advice in various domains (e.g., public health, fashion, politics). This early work revealed (自20世纪中期以来,社会学家和传播科学家一直对有影响力的个人的研究感兴趣,这是可以理解的。通过准确识别具有影响力的个人,政策制定者可以在公共卫生和传播有益于社会的创新等关键问题上左右公众舆论。早期研究[27,33,55]调查了大量的人,通常是美国有代表性的中等城市的居民,并询问他们在各个领域(如公共卫生、时尚、政治)向谁咨询建议。这项早期工作揭示了)
(3)With the advent of computational methods, network theory, and the Internet, the research focus shifted to describing networks of social influence and developing methods for leveraging the clout of influential individuals in them [3, 28, 35, 54]. Social networks could be directly reconstructed by observing friendships or follower counts on online websites. Seminal methods for ranking search results, such as PageRank, use a network’s structure to assign value to different sources of information or individuals (e.g., webpages or blogs, see [41]). PageRank’s general approach has been used by social scientists to assign status to different people or sources of information in the offline world. Here, social influence is a consequence of the network’s structure, where well-connected (or well-positioned) individuals are most influential [13, 26]. (随着计算方法、网络理论和互联网的出现,研究重点转向描述具有社会影响力的网络,并开发利用其中有影响力的个人影响力的方法[3,28,35,54]。社交网络可以通过观察在线网站上的友谊或关注人数来直接重建。对搜索结果进行排名的开创性方法,如 PageRank ,使用网络结构为不同的信息源或个人(例如网页或博客,见[41])分配价值。社会学家使用PageRank的一般方法,在离线世界中为不同的人或信息源分配身份。在这里,社会影响力是网络结构的一个结果,其中关系良好(或位置良好)的个人最具影响力[13,26]。)
(4)Coming to grips with the structure of social influence is crucial for the recommender systems and computational social science communities. Classic collaborative filtering algorithms, such as the weighted k-nearest neighbors algorithm (k-nn), essentially distribute social influence among the individuals in the system’s knowledge base [11]. For each target individual, k-nn pays attention to only a relatively small number of similar others (typically between 10 and 50, see [22, 23])—implying a particular network of social influence [29, 32]. Critically, k-nn can also represent a broad array of decision strategies that have been studied by social and behavioral scientists in offline settings (see Table 1). As in the communities studied by sociologists and communication scientists since the 1950s, the opinions of a few, influential individuals might be consulted more often by recommender systems. Going beyond previous research, we can now uncover the statistical properties of the opinions of the individuals whose advice is sought, and investigate the performance of different social learning strategies. (对于推荐系统和计算社会科学社区来说,掌握社会影响力的结构至关重要。经典的协同过滤算法,如加权k近邻算法(k-nn),本质上是在系统知识库中的个体之间分配社会影响[11]。对于每个目标个体,k-nn只关注相对较少的类似个体(通常在10到50之间,见[22,23])——这意味着一个特定的社会影响网络[29,32]。关键的是,k-nn还可以代表社会和行为科学家在离线环境中研究过的一系列广泛的决策策略(见表1)。正如自20世纪50年代以来社会学家和传播科学家所研究的社区一样,推荐系统可能会更频繁地咨询少数有影响力的个人的意见。除了之前的研究,我们现在可以发现被征求意见的个人意见的统计特性,并调查不同社会学习策略的表现。)
(5)Previous research on social influence in recommender systems has focused on two main topics. (之前关于推荐系统中社会影响的研究主要集中在两个主题上。)
(6)Several questions pertaining to both offline and online opinion spaces remain unaddressed: First, is it possible to identify characteristics (e.g., statistical properties) that reliably predict whether somebody is influential or has the potential to become influential within a domain? Second, how do the recommender algorithms or social learning strategies used determine the distribution of social influence (e.g., varying k in k-nn or the number of people asked for advice offline)? Third, what is the structure of the networks produced by k-nn and the corresponding social learning strategies? In this paper, we investigate these three questions in a diverse set of large- and small-scale datasets. (关于离线和在线意见空间的几个问题仍然没有得到解决:首先,是否有可能确定可靠地预测某人在某个领域是否有影响力或有可能成为有影响力的人的特征(例如统计特性)?第二,所使用的推荐算法或社会学习策略如何决定社会影响的分布(例如,k-nn中的k变化或离线咨询的人数)?第三,k-nn产生的网络结构和相应的社会学习策略是什么?在本文中,我们在一组不同的大型和小型数据集中研究这三个问题。)
The simulation framework, results, and the code for visualizing the results are publicly available at https://osf.io/duj8q/.
(1)In our analysis, we rely on the widely used k-nearest neighbors algorithm (k-nn) [15, 44, 46], allowing for differential weights [7]. Such a weighted nearest neighbor algorithm can be expressed as follows: (在我们的分析中,我们依赖于广泛使用的k-最近邻算法(k-nn)[15,44,46],考虑到不同的权重[7]。这种加权最近邻算法可以表示为:)
(2)We used the Pearson correlation coefficient as a measure of similarity ( w w w) between two individuals i i i and j j j [23], defined as follows: (我们使用皮尔逊相关系数作为两个个体 i i i和 j j j[23]之间相似性( w w w)的度量,定义如下:)
(3)We use a similarity sensitivity parameter ρ \rho ρ that allows us to amplify or dampen the weights of different individuals [7, 40]. We directly modify the weights obtained from Eq. 2 using the following scheme: (我们使用了一个相似敏感性参数 ρ \rho ρ,它允许我们放大或减弱不同个体的权重[7,40]。我们使用以下方案直接修改从等式2获得的权重:)
(4)By varying k k k and ρ \rho ρ, we can produce several collaborative filtering algorithms and social learning and information aggregation strategies studied in the social and behavioral sciences [2]. (通过改变 k k k和 ρ \rho ρ,我们可以产生几种协同过滤算法,以及社会和行为科学[2]研究的社会学习和信息聚合策略。)
We analyzed an array of datasets, including
Visual art: 24 people evaluated 109 photographs of visual art sourced from the Catalog of Art Images Online (CAMIO) and from museum collections. The collection included lesser-known artwork from a variety of periods, styles, genres, and cultural backgrounds. (24人对109张视觉艺术照片进行了评估,这些照片来源于在线艺术图像目录(CAMIO)和博物馆藏品。该系列包括来自不同时期、风格、流派和文化背景的鲜为人知的艺术品。)
Interior and exterior architecture: 17 people evaluated 118 interior architecture images and 19 people evaluated 108 exterior
architecture images, all of which were chosen to highlight architectural detail. Most of them were selected from ArtStor, an image database that covers many cultures and periods. (室内和室外建筑:17人评估了118张室内建筑图片,19人评估了108张室外建筑图片,所有这些图片都是为了突出建筑细节。其中大部分是从ArtStor中挑选出来的,ArtStor是一个涵盖多种文化和时期的图像数据库。)
Landscapes: 18 people evaluated 148 natural images representing a diverse set of biomes, weather, and views. (景观:18人评估了148幅代表不同生物群落、天气和景观的自然图像。)
Faces: 2,513 people (ages 17–90 years) evaluated the attractiveness of 102 male and female individuals of varying ages and eth-
nic backgrounds on a 1–7 scale ranging from “much less attractive than average” to “much more attractive than average” (see
http://faceresearch.org/). (面孔:2513人(年龄17-90岁)对102名不同年龄和eth-nic背景的男性和女性的吸引力进行了1-7级评估,范围从“远低于平均水平”到“远高于平均水平”(见http://faceresearch.org/).)
Jester jokes: The Jester dataset was collected from April 1999 to May 2003 by an online recommender system that allowed Internet users to read and rate jokes on a scale ranging from “not funny” (−10) to “funny” (+10). Users first evaluated a number of jokes in
random order; the system then recommended jokes from a pool of 100 items until all jokes were presented. For simplicity, we used
only the data from participants who evaluated all jokes (reducing the number of participants from 73,421 to 14,116). (Jester笑话:Jester数据集是由一个在线推荐系统从1999年4月到2003年5月收集的,该系统允许互联网用户阅读笑话,并根据“不好笑”等级别对笑话进行评分(−10) 到“搞笑”(+10)。用户首先以随机顺序评估一些笑话;然后,系统从100个项目中推荐笑话,直到所有笑话都呈现出来。为了简单起见,我们只使用了参与者评估所有笑话的数据(将参与者数量从73421减少到14116)。)
To investigate the relation between the statistical properties of people’s taste and the performance of k-nn, we calculated the mean taste similarity, defined as the (arithmetic) average correlation between each individual’s taste ratings and the ratings of all of their potential peers, and taste dispersion, defined as the standard deviation of those same correlations [2]. In Table 2, we also report the grand mean of those mean taste similarities (referred as shared taste) and taste dispersions for each dataset. Unless otherwise noted, we present results for ρ = 1 \rho = 1 ρ=1. For the Jester and Faces environments, we plot the networks for a subsample of individuals in Figure 1. (为了研究人们口味的统计特性与k-nn性能之间的关系,我们计算了平均口味相似性,定义为每个人的口味评分与其所有潜在同伴的评分之间的(算术)平均相关性,以及味觉分散度,定义为这些相同相关性的标准偏差[2]。在表2中,我们还报告了每个数据集的平均口味相似性(称为共享味觉)和口味分散度的总平均值。除非另有说明,我们给出 ρ = 1 \rho=1 ρ=1的结果。对于Jester和Faces环境,我们在图1中绘制了个人子样本的网络。)
(1)Roger Ebert is probably the most famous film critic in the history of film-making. His opinion was sought by scores of movie-goers and a website bearing his name is still active. But was there something special about Ebert’s opinions that made him a nationwide phenomenon in the United States and source of advice for so many people? Are there people like Ebert in recommender systems? And is it possible to identify them solely on the basis of the statistical properties of their tastes? (罗杰·埃伯特可能是电影制作史上最著名的影评人。数十名电影观众征求了他的意见,一个以他的名字命名的网站仍在活跃。但埃伯特的观点是否有什么特别之处,使他在美国成为一个全国性的现象,并为这么多人提供建议?在推荐系统中有像埃伯特这样的人吗?有没有可能仅仅根据它们的口味的统计特性来识别它们?)
(2)Our work looks at social influence in recommender systems through the lens of network theory. Hitherto, the recommender systems community has used social networks primarily as an additional source of information [18, 37, 50], and used network theory more broadly to visualize recommender systems as bipartite user-item networks (see, e.g., [57]). Here, extending early work by Lathia et al. [32], we investigated the social networks of influence produced by the weighted k-nearest neighbors algorithm (k-nn).We found that skewed social influence distributions are inherent in recommender systems and that the emerging networks are hierarchically organized. The most influential individuals (sitting on top of the hierarchies) tend to be those who benefit the most from the k-nn algorithm. (我们的工作通过网络理论的视角来研究推荐系统中的社会影响。迄今为止,推荐系统社区主要使用社交网络作为额外的信息来源[18,37,50],并更广泛地使用网络理论将推荐系统可视化为两部分用户项网络(参见,例如[57])。在这里,我们扩展了Lathia等人[32]的早期工作,研究了加权k-最近邻算法(k-nn)产生的影响的社会网络。我们发现,倾斜的社会影响力分布在推荐系统中是固有的,新兴网络是分层组织的。最有影响力的个人(坐在层次结构的顶端)往往是那些从k-nn算法中受益最多的人。)
(3)Previous research showed that malicious individuals can game recommender algorithms by designing bots that evaluate options in a way that makes the evaluations appear informative to many similar others [31,45,48]. Our results provide an explanation for the efficiency of averaging attacks on collaborative filtering algorithms (i.e., rating each item by its average and adding some noise). Rating profiles using averaging schemes score very high, in terms of both mean taste correlation and often also dispersion of taste similarity with the crowd. If such an individual actually existed, they would be among the most influential in the settings we studied and would benefit a lot from recommendations. More broadly, our results show that it is possible to consistently identify individuals who are more likely to become influential by looking at the statistical properties of their taste. (之前的研究表明,恶意个人可以通过设计机器人来对选项进行评估,从而让评估结果对许多类似的人来说都是有用的[31,45,48]。我们的结果解释了平均攻击协同过滤算法的效率(即,根据每个项目的平均值对其进行评级,并添加一些噪声)。在平均味觉相关性和味觉相似性在人群中的分散性方面,使用平均方案的评分模式得分非常高。如果真的存在这样一个人,他们将是我们研究的环境中最有影响力的人之一,并将从建议中受益匪浅。更广泛地说,我们的结果表明,通过观察个人品味的统计特性,可以始终如一地识别出更有可能成为有影响力的人。)
(4)The k-nn algorithm and its capacity to emulate different social learning strategies provides a fresh way to look at networks of social influence in the offline world. (k-nn算法及其模拟不同社会学习策略的能力为研究离线世界中的社会影响力网络提供了一种新的方法。)
Taken together, our results show that it is possible to analyze recommender systems algorithms and their consequences at both the individual and aggregate level. (综上所述,我们的结果表明,在个体和群体层面上分析推荐系统算法及其后果是可能的。)