2014 ICM Problem Using Networks to Measure Influence and Impact
作者:Ternence Zhang
转载注明出处:http://blog.csdn.net/zhangtengyuan23
ICM原题pdf: http://download.csdn.net/detail/zhangty0223/6901345
One of the techniques to determine influence of academic research is to build and measure properties of citation or co-author networks. Co-authoring a manuscript usually connotes a strong influential connection between researchers. One of the most famous academic co-authors was the 20th-century mathematician Paul Erdös who had over 500 co-authors and published over 1400 technical research papers. It is ironic, or perhaps not, that Erdös is also one of the influencers in building the foundation for the emerging interdisciplinary science of networks, particularly, through his publication with Alfred Rényi of the paper “On Random Graphs” in 1959. Erdös’s role as a collaborator was so significant in the field of mathematics that mathematicians often measure their closeness to Erdös through analysis of Erdös’s amazingly large and robust co-author network (see the website http://www.oakland.edu/enp/ ). The unusual and fascinating story of Paul Erdös as a gifted mathematician, talented problem solver, and master collaborator is provided in many books and on-line websites (e.g., http://www-history.mcs.st-and.ac.uk/Biographies/Erdos.html). Perhaps his itinerant lifestyle, frequently staying with or residing with his collaborators, and giving much of his money to students as prizes for solving problems, enabled his co-authorships to flourish and helped build his astounding network of influence in several areas of mathematics. In order to measure such influence as Erdös produced, there are network-based evaluation tools that use co-author and citation data to determine impact factor of researchers, publications, and journals. Some of these are Science Citation Index, H- factor, Impact factor, Eigenfactor, etc. Google Scholar is also a good data tool to use for network influence or impact data collection and analysis. Your team’s goal for ICM 2014 is to analyze influence and impact in research networks and other areas of society. Your tasks to do this include:
1) Build the co-author network of the Erdos1 authors (you can use the file from the website https://files.oakland.edu/users/grossman/enp/Erdos1.html or the one we include at Erdos1.htm ). You should build a co-author network of the approximately 510 researchers from the file Erdos1, who coauthored a paper with Erdös, but do not include Erdös. This will take some skilled data extraction and modeling efforts to obtain the correct set of nodes (the Erdös coauthors) and their links (connections with one another as coauthors). There are over 18,000 lines of raw data in Erdos1 file, but many of them will not be used since they are links to people outside the Erdos1 network. If necessary, you can limit the size of your network to analyze in order to calibrate your influence measurement algorithm. Once built, analyze the properties of this network. (Again, do not include Erdös --- he is the most influential and would be connected to all nodes in the network. In this case, it’s co-authorship with him that builds the network, but he is not part of the network or the analysis.)
2) Develop influence measure(s) to determine who in this Erdos1 network has significant influence within the network. Consider who has published important works or connects important researchers within Erdos1. Again, assume Erdös is not there to play these roles.
3) Another type of influence measure might be to compare the significance of a research paper by analyzing the important works that follow from its publication. Choose some set of foundational papers in the emerging field of network science either from the attached list (NetSciFoundation.pdf) or papers you discover. Use these papers to analyze and develop a model to determine their relative influence. Build the influence (coauthor or citation) networks and calculate appropriate measures for your analysis. Which of the papers in your set do you consider is the most influential in network science and why? Is there a similar way to determine the role or influence measure of an individual network researcher? Consider how you would measure the role, influence, or impact of a specific university, department, or a journal in network science? Discuss methodology to develop such measures and the data that would need to be collected.
4) Implement your algorithm on a completely different set of network influence data --- for instance, influential songwriters, music bands, performers, movie actors, directors, movies, TV shows, columnists, journalists, newspapers, magazines, novelists, novels, bloggers, tweeters, or any data set you care to analyze. You may wish to restrict the network to a specific genre or geographic location or predetermined size.
5) Finally, discuss the science, understanding and utility of modeling influence and impact within networks. Could individuals, organizations, nations, and society use influence methodology to improve relationships, conduct business, and make wise decisions? For instance, at the individual level, describe how you could use your measures and algorithms to choose who to try to co-author with in order to boost your mathematical influence as rapidly as possible. Or how can you use your models and results to help decide on a graduate school or thesis advisor to select for your future academic work?
6) Write a report explaining your modeling methodology, your network-based influence and impact measures, and your progress and results for the previous five tasks. The report must not exceed 20 pages (not including your summary sheet) and should present solid analysis of your network data; strengths, weaknesses, and sensitivity of your methodology; and the power of modeling these phenomena using network science.
*Your submission should consist of a 1 page Summary Sheet and your solution cannot exceed 20 pages for a maximum of 21 pages.
This is a listing of possible papers that could be included in a foundational set of influential publications in network science. Network science is a new, emerging, diverse, interdisciplinary field so there is no large, concentrated set of journals that are easy to use to find network papers even though several new journals were recently established and new academic programs in network science are beginning to be offered in universities throughout the world. You can use some of these papers or others of your own choice for your team’s set to analyze and compare for influence or impact in network science for task #3.
Erdös, P. and Rényi, A., On Random Graphs, Publicationes Mathematicae, 6: 290-297, 1959. Albert, R. and Barabási, A-L. Statistical mechanics of complex networks. Reviews of Modern Physics, 74:47-97, 2002.
Bonacich, P.F., Power and Centrality: A family of measures, Am J. Sociology. 92: 1170-1182, 1987.
Barabási, A-L, and Albert, R. Emergence of scaling in random networks. Science, 286:509-512, 1999.
Borgatti, S. Identifying sets of key players in a network. Computational and Mathematical Organization Theory, 12: 21-34, 2006.
Borgatti, S. and Everett, M. Models of core/periphery structures. Social Networks, 21:375-395, October 2000
Graham, R. On properties of a well-known graph, or, What is your Ramsey number? Annals of the New York Academy of Sciences, 328:166-172, June 1979.
Kleinberg, J. Navigation in a small world. Nature, 406: 845, 2000.
Newman, M. Scientific collaboration networks: II. Shortest paths, weighted networks, and centrality. Physical Review E, 64:016132, 2001.
Newman, M. The structure of scientific collaboration networks. Proc. Natl. Acad. Sci. USA, 98: 404-409, January 2001.
Newman, M. The structure and function of complex networks. SIAM Review, 45:167-256, 2003.
Watts, D. and Dodds, P. Networks, influence, and public opinion formation. Journal of Consumer Research, 34: 441-458, 2007.
Watts, D., Dodds, P., and Newman, M. Identity and search in social networks. Science, 296:1302-1305, May 2002.
Watts, D. and Strogatz, S. Collective dynamics of `small-world' networks. Nature, 393:440-442, 1998.
Snijders, T. Statistical models for social networks. Annual Review of Sociology, 37:131–153, 2011.
Valente, T. Social network thresholds in the diffusion of innovations, Social Networks, 18: 69-89, 1996.
Erdos1, Version 2010, October 20, 2010
2014 ICM问题:使用网络来测量影响和冲击
确定学术研究的影响力的其中一项技术是建立和衡量引文或合著者网络的性能。共同创作的手稿(论文)通常蕴含研究者之间有很强的影响力的连接。其中最有名的学者共同作者是20世纪的数学家Paul Erdös拥有超过500合著者,并公布了1400技术的研究论文。这是具有讽刺意味的,那埃尔德什也是在为新兴交叉科学的基础建设网络,特别是通过他和Alfred Rényi 在1959年发表的论文“关于随机图”。Erdös 的作为合作者的角色在数学方面很惊人,体现在Erdös 的超大,健壮的合著者网络分析测量(见网站http://www.oakland.edu/enp/ ) 。Paul Erdös的不寻常的和引人入胜的故事是作为一个天才的数学家,天才的问题解决者,并掌握和提供合作者在许多书籍和在线网站(例如, http://www-history.mcs.st-and.ac.uk /人物传记/ Erdos.html ) 。也许他的流动生活中经常与他的合作者住在一起,以及话费不少的钱通过让学生提供解决方案,使他的合作authorships(作者关系|作者间友谊)蓬勃发展,并帮助建立他的在几个数学的领域有惊人影响力网络。为了衡量Erdös创造出的这种影响力,有基于网络的评估工具通过使用的共同作者和引文数据,以确定研究人员,出版物和期刊的影响因子。其中有些是科学引文索引, H因子,影响因子,特征因子等,谷歌学术搜索也是一个不错的数据工具,以用于网络影响力或影响数据的收集和分析。你的团队的ICM 2014年的目标是分析研究网络和社会其他领域的影响和冲击。
你的任务做到这一点,包括:
1)构建Erdos1author的合著者网络(你可以使用我们网站https://files.oakland.edu/users/grossman/enp/Erdos1.html)。你应该建立一个约有510名研究人员的合作者网络(数据从文件Erdos1中 获取),谁与Erdös的合著一篇论文, 但不包括Erdös。这需要一些技术数据提取和建模工作获得正确设置的节点(Erdös合著者) 和他们的链接(彼此作为合作者的连接)。有超过18000行Erdos1的原始数据文件,但是很多人不会用因为它们链接Erdos1网络之外的人。如果有必要,你可以限制你的网络的规模分析,以校准你的影响力度量算法。一旦建立,分析该网络的属性。(同样,不包括Erdös——他是最有影响力的,将连接到网络中的所有节点。在这种情况下,它是包括Erdös合著营造网络,但Erdös不属于网络或分析。(not part of the network or the analysis.))
2)开发影响措施(s)决定谁在这个Erdos1网络在网络中有显著的影响。考虑谁发表了重要的作品在Erdos1或连接重要人员。同样,假设没有Erdös扮演这些角色。
3)另一种类型的影响的措施可能会按照其出版的论文的研究意义比较分析的工作。选择一套基本的文件在网络科学的新兴领域,无论是从所附清单(netscifoundation。PDF)或文件,你发现。使用这些文件来分析和建立一个模型来确定它们的相对影响。建立的影响(合著者或引用)的网络和计算你适当的措施分析。您的分析这在您所设定的论文,你认为是最有影响力的网络科学,为什么?是否有一个类似的方法来确定一个人的网络研究者的角色或影响的措施呢?考虑你将如何衡量网络科学中的作用,影响,或影响一个特定大学,部门或刊物?讨论的方法来开发这样的措施和你将要收集的数据。
4)实现你对一组完全不同的网络影响力数据的算法---比如,有影响力的作曲家,乐队的音乐,表演,电影演员,导演,电影,电视节目,专栏作者,记者,报纸,杂志,小说家,小说,博客,高音喇叭,或任何数据集,你在乎分析。你不妨在网络限制到一个特定的流派或地区,或预定的尺寸。
5)最后,讨论科学、理解和建模的影响和影响在网络的效用。可以个人、组织、国家和社会使用影响方法改善人际关系,做生意,和做出明智的决定吗? 例如,在个体层面,描述如何使用你的措施和算法选择谁试图与合著者为了尽快提高你的数学的影响。或你如何使用你的模型和结果来帮助决定毕业学校或导师的选择为你的未来学术工作吗?
6)写报告解释您的建模方法,基于网络的影响和影响的措施,和之前的五项任务的进程和结果。报告不能超过20页(不包括你的汇总表),应该提供确凿的网络数据的分析,优势,劣势,和灵敏度的方法,建模这些现象使用网络科学的力量。
你的提交应该由一个1页汇总表和您的解决方案不能超过20页最长21页。
网络科学是一个新的、新兴的、多样的、交叉学科领域,所以没有更大的、更浓缩期刊集,很容易的用它来寻找网络方面的论文,即使一些新的杂志最近才明确建立起来,而在世界各地的大学里,关于网络科学新的学术项目都开始将予提呈发售。
你可以使用一些论文或其他你自己选择的论文作为您的团队论文集,分析和比较任务3在网络科学里的影响或效果.
Erdös, P. and Rényi, A., On Random Graphs, Publicationes Mathematicae, 6: 290-297, 1959. Albert, R. and Barabási, A-L. Statistical mechanics of complex networks. Reviews of Modern Physics, 74:47-97, 2002.
Bonacich, P.F., Power and Centrality: A family of measures, Am J. Sociology. 92: 1170-1182, 1987.
Barabási, A-L, and Albert, R. Emergence of scaling in random networks. Science, 286:509-512, 1999.
Borgatti, S. Identifying sets of key players in a network. Computational and Mathematical Organization Theory, 12: 21-34, 2006.
Borgatti, S. and Everett, M. Models of core/periphery structures. Social Networks, 21:375-395, October 2000
Graham, R. On properties of a well-known graph, or, What is your Ramsey number? Annals of the New York Academy of Sciences, 328:166-172, June 1979.
Kleinberg, J. Navigation in a small world. Nature, 406: 845, 2000.
Newman, M. Scientific collaboration networks: II. Shortest paths, weighted networks, and centrality. Physical Review E, 64:016132, 2001.
Newman, M. The structure of scientific collaboration networks. Proc. Natl. Acad. Sci. USA, 98: 404-409, January 2001.
Newman, M. The structure and function of complex networks. SIAM Review, 45:167-256, 2003.
Watts, D. and Dodds, P. Networks, influence, and public opinion formation. Journal of Consumer Research, 34: 441-458, 2007.
Watts, D., Dodds, P., and Newman, M. Identity and search in social networks. Science, 296:1302-1305, May 2002.
Watts, D. and Strogatz, S. Collective dynamics of `small-world' networks. Nature, 393:440-442, 1998.
Snijders, T. Statistical models for social networks. Annual Review of Sociology, 37:131–153, 2011.
Valente, T. Social network thresholds in the diffusion of innovations, Social Networks, 18: 69-89, 1996.
Erdos1, Version 2010, October 20, 2010
这个列表是511个保罗埃尔德什的合作者,连同它们下面列出他们的合作者。
分析:
网络分为非加权复杂网络和加权复杂网络,典型的非加权复杂网络有随机网络、小世界网络、BA无标度网络。典型的加权复杂网络有BBV加权复杂网络等。目前复杂网络的研究重点已经从非加权复杂网络转向了加权复杂网络,而语义关系网络也是一种加权复杂网络,
例:Sznajd舆论模型应用到加权网络:在异步更新的方式下,将Sznajd舆论模型应用到加权网络上,用网络的节点代表一个人,用节点之间的连接权重来描述人际之间的亲疏关系,研究加权网络的磁化率和权重对舆论演化的影响情况,并进行计算机模拟.结果表明,加权网络上的Sznajd舆论模型不存在僵持态,以相等的概率0.5达到全体向上或全体向下的垄断态;初始磁化率大于0,终态磁化率也肯定大于0;权重更新系数越大越不利于系统一致意见终态形成.
考虑下:
无向网络
加权网络
节点强度
1、无向网络:
一条边如果仅有一个方向可通行则称为有向边,如果两个方
向均可通行则称为无向边。若一个网络中所有的边都无向,则称
此网络为无向网。
2、加权网络 :
设wij表示相连的两个节点i和j之间边的权重。一个加权网
络可以用网络的连接权重矩阵(wij )表示,其中, i , j = 1, 2, ⋯N,
N为网络的规模,即节点总数。
3、节点强度
si:它包含了节点的连接度的信息,同时也包含了所有与
其相连的边的权重信息。其表达式如下:
si = Σj∈τ( i)wi, j ,τ( i)
表示所有与节点i相连的节点的集合本文描述的科研合作网络是无向网络,权重矩阵是对称的,节点代表科研人员,节点之间的边代表了科研人员间的合作关系,边权值代表他们合作发表文章的篇数,此值越大代表这个科研人员与其他科研人员的科研合作关系越多,该节点就越有吸引力。
局域化:
通过对1998年1月至2004年6月发表于我国《物理学报》
和《Chinese Physics》上的有关混沌科学理论方面的科研论文作者
相互合作研究所构成的小型网络进行实证统计,我们得到了一个
拥有266篇文献、涉及60位作者、89个子网络的小型科研合作网
络。其中,A类子网络占总子网络数4 /5以上,但每个子网络人
数都很少; B类子网络占总子网络数不到1 /5,所有A类子网络的
人数总和为182,平均每个子网络节点数为2. 52。所有B类子网
络的人数总和为161,其拥有总人数接近A类子网络拥有的总人
数,且平均每个子网络节点数为9. 47,大大高于前者。这种现状
在某种程度上说明科研合作网络存在局域世界内部联系较密切、
局域与局域外连接相对较弱的特点,即科研合作网络存在局域化
特点。
我们可将因合作而形成的所有子网络分为两大类: A类子网络与B类子网络
A类子网络与B类子网络:前者指独撰论文作者孤立点和子网络内的任意两个节点之间都有合作的全连通子网络 ,后者指此类子网络内的任意两个节点或任意两个 A类子网络之间都有合作关系且该关系不确定 ,仅从拓扑结构上看 ,任意两个节点间总存在连线。
模型简述:
考虑的动态演化生长模型允许老节点之间的边重连。若重连 ,则两节点之间的边权增加。如在一个科研合作网中 ,边
权代表了科研合作人员发表的论文数目 ,边权的增加代表科研人员合作发表论文数目的增长 ,这将导致网络内部新的演化机制。节点之间按强度分布优先进行连接 ,这比只按度优先连接更合理 ,因为节点的强度更能反映一位科学家的能力与成就。该模型的算法可以分为如下几步
1) 给定一个具有 m0 个节点 , e0 条边的初始网络。初始的 e0 条边没有重连 ,并给每条边赋予初始权值 1。
2) 每个时间间隔 ,随机选取 M (M <m0)个点 ,作为一个局域世界 ,并且循环执行以下 3步 :
(1) 以概率 p1 在局域世界与局域外添加 m条边 ,实现局域世界与外界的连接 :按强度分布优先从局域世界中选取一个节点 ,再在局域世界外同样按强度分布优先选取另一个节点 ,连接这两个节点 ,得到一条边。允许重连。若重连 ,则按此种方式连接的边 ,边权 wij增加 a;反之 ,则按此种方式连接的边 ,边权 wij赋值为 a。依次添加 m (m <M )条边。概率 p1 比较小 ,因为局域与局域外的连接相比与局域内部的连接要少得多。此处 a为一个大于 0的常数 , a的取值可以根据局域与局域外联系重要性的情况来给定。
(2) 以概率 p2 向局域世界加入 m条边 ,实现局域内部的演化 :按强度分布优先从局域中选取一个节点 ,再在局域中随机选取另外一个节点。连接这两点 ,得到一条边。允许重连。若重连 ,则按此种方式连接的边 ,边权 wij增加 1;反之 ,则按此种方式连接的边 ,边权 wij赋值为 1。依次添加 m条边。
(3) 以概率 p3 向局域世界加入一个新节点和 m条边 ,实现局域世界的增长 :向局域世界中加入一个新节点和 m条边。边连接则是按强度分布优先 ,并赋予按此种方式连接的边的边权为 1。但这 m条边不允许重连和自连。
注意 : p1、p2、p3 满足 p1 +p2 +p3 =1。重复进行第 2)步 ,经过t个时间间隔 ,将会有 m0 +p3t个节点 , e0 +mt 条边 ,所有节点的强度总和为 2e0 +2 ( ap1 +p2 +p3) mt。设经过时间 t总节点数为N,则 N =m0 +p3t