Community detection (2)

整理下资料,先丢上来,后面再慢慢扩充。

大牛们(Famous Researchers)

(不完全,只是选了几个我经常看到的)

M.E.J. Newman

2004 Finding andevaluating community structure in networks

2006 Modularity andcommunity structure in networks

提出著名的modularity,衡量类内链接密集,类间链接稀疏的社团

2006 Mixture models andexploratory analysis in networks

在不知道社团结构的情况下寻找社团,有点毁三观,其实目标函数是找相同的链接模式

2008 Hierarchicalstructure and the prediction of missing links in networks

这篇上了Nature,层次结构能够描述复杂网络的结构,进而用来预测边。你们还在用社团的准确性来衡量层次结构么,弱爆了,看大牛直接用层次结构重构网络!

2011 Stochasticblockmodels and community structure in networks

度修正的随机块模型,壮哉block model

2012 Communities,modules and large-scale structure

社团检测的入门读物,发到nature physics上,有中文翻译(一时间没找到网址,想要的留个邮箱)

2010 Network: AnIntroduction

Newman出的本书,网站有目录,讲的比较基础

Steve Gregory

2007  An algorithm to findoverlapping community structure in networks

改进GN算法到重叠社团,大致就是允许点也进行分裂

老爷子挺喜欢改进的,有篇文写的是将任意无重叠算法扩展成重叠的,大致是先用这里的方法把点分裂了,再用无重叠的方法检测

2010  Finding overlapping communities in networks by label propagation

标号传播方法

2011  Fuzzy overlappingcommunities in networks

认为重叠还有两种,crispfuzzy,相当于是硬重叠和软重叠,评价了当前方法对这两种重叠的检测效果

YYAhn

2010 Link communities revealmultiscale complexity in networks

感觉自从这篇文上了Nature,边社团一下就火了= =

方法很简单,定义了边的相似度,做了个层次聚类

实验做的非常丰富!

Tim S. Evans

2009 Line graphs, linkpartitions, and overlapping communities

提起边社团,怎么能不提Evansline graph呢,他把边映射成点,于是用传统关于点的方法就可以得到边社团。

EvansAhn还写了声明说两人是独立完成工作的,碰巧都是关于边社团()

Peter J. Mucha

2010 Community Structure in Time-Dependent, Multiscale, andMultiplex Networks

这篇上了Science,讲多片的网络,比如随时间依赖的,边类型多样的,多种分辨率的。方法很巧,把各网络相同的点连了条边,从而将所有网络连到一起

Vincent Blondel

2008 Fast unfolding of communities in large networks

(无人可及)快速的无重叠社团检测方法BGLL目标函数是modularity,仔细解读过它的代码,c++写的,以至于后来写的风格都跟它一样

Gergely Palla

2005 Uncovering theoverlapping community structure of complex networks in nature and society

2007 Quantifying socialgroup evolution

砸上两篇Nature 05年那篇是讲经典的clique方法;07年那篇讲社团的演变

2006 CFinder: locatingcliques and overlapping modules in biological networks

经典的clique方法的工具CFinder,填个表可免费使用 

RenaudLambiotte

Evans合作的line graph,和Blondel合作的BGLL

Liu Huan

 

期刊会议 (Conference and Journal)

 

关键字:community detection, social network, socialnetwork analysis, complex network, cluster, graph partition

 

Nature

Science

 

AAAI

WWW

 

ICDM

SIGKDD

SIGMOD

PKDD

PAKDD

TKDD

SDM

CIKM

 

Proceedings of the National Academy of Sciences 9.681

New journal of Physics     4.177

Physical Review E    2.255

Journal of Statistical Mechanics: Theory and Experiment      1.7

Journal of Physics A: Mathematical and Theoretical      1.540

The European Physical Journal B     1.534

Physica A: Statistical Mechanics and its Applications    1.373

EPL (Europhysics Letters)      

 

PLOS One 4.096

Complex networks 

Social networks        2.931

Network Science     

 

右边一列数字是影响因子,每年在变,也忘记这是哪年的了

以上也是摘的常见到的,除了数据挖掘相关的,还有大片物理的,是的,有一大群物理学家在搞这方面,比如MarkNewman = =事实上 生物,社会,物理,数学,计算机科学的人都有在搞,交叉学科嘛

 

研究点梳理 (Knowledge Graph)

相关的wiki

http://en.wikipedia.org/wiki/Community_structure

http://en.wikipedia.org/wiki/Cluster_analysis

 

学科关系图

Community detection (2)_第1张图片


从以下几方面能大致描述一篇论文的研究方面(个人总结,不足求喷)

 

Flat cluster聚类结果是对网络的一个划分,一般结果都是这样

Hierarchical cluster层次聚类,结果是社团包含关系的树形图(dendrogram)

 

OverlappingFuzzy/Crispassignment成员可以属于多个社团

Non-overlappingHardassignment成员只能属于一个社团

 

Static network网络是固定的,不随时间变化,通常是

Dynamic network网络会随着时间变化

Multiplex network网络中的边有多种类型

Bipartite network网络中的点有两种类型(依此类推可以有多种类型)

 

Density community目标是内部链接密集的社团

Bipartite community 目标是内部链接稀疏的社团,通常是将网络划分为二部图或多部图

Mixture community目标是链接模式类似的社团,上述两者的混合

说起来大多社团的定义都是靠的算法,算法检测出来什么就定义成什么==

 

Global利用全局信息,检测网络整体的社团划分

Local利用局部信息,比如考虑一个点时只看它的邻居点,可以检测网络局部的社团,比如指定一个点,看它周围的社团划分情况,很实际的应用,尤其是当数据规模非常大的时候

 

Increment(online) 算法支持在线更新,即添加或删除一些点(边),不用重新再跑一遍,简单地调整下就好了,适合于实时变动、规模大的网络。

 

进一步还有研究

Node properties (hub, periphery) 研究节点的性质,比如是否为关键点,中心点,边缘点,引导者,跟随者等

Spread process 研究信息的传播过程,比如舆论传播,病毒传播。

Link prediction预测缺失的边,其实就是推荐

 

Evaluation检测的效果好不好需要评价指标,目前还没有公认的好的评价指标。直接和带标签的真实网络比吧,小规模的网络没有说服力。大规模的数据,社团的定义都不一定相同。一些好文章,是自己做的数据集,用自己的评价指标来衡量。于是一些人专门做了一系列实验,从比较客观的角度,来评价当前的算法,这也是个研究方面。

 

Visualization评价指标得到定量的分析,但也只是一堆数,人们还是喜欢看到图,如何可视化地展示社团结构也是个问题。

 

方法综述

Community detection (2)_第2张图片

来自http://blog.sciencenet.cn/blog-798640-677758.html

 

http://blog.sina.com.cn/s/blog_63891e610101722t.html

 

(留个空自己总结个) 

 

综述论文 (Surveys)

2010 Community detectionin graphs

工具书般的综述= =

2012 Communities,modules and large-scale structure

社团检测的入门读物,发到nature physics上,有中文翻译

2012 Temporal networks

总结了随时间变化的网络结构的分析方法

2013 Overlappingcommunity detection in networks: The state-of-the-art and comparative study

重叠算法的综述

工具 (Tool)

Gephi

Gephi is an interactivevisualization and explorationplatform forall kinds of networks and complex systems, dynamic and hierarchical graphs.

Runs on Windows, Linuxand Mac OS X. Gephi is open-source and free.

http://gephi.org/users/download/

NetLogo

NetLogo is a multi-agentprogrammable modeling environment. It is used by tens of thousands of students,teachers and researchers worldwide. It also powers HubNet participatorysimulations. It is authored by Uri Wilensky and developed at the CCL. You candownload it free of charge.

http://ccl.northwestern.edu/netlogo/download.shtml

Pajek

Pajek (Slovene word forSpider) is a program, for Windows, for analysis and visualization of largenetworks. It is freely available, for noncommercial use, at itsdownload page.

http://pajek.imfm.si/doku.php?id=download

iGraph

igraphis a free software package for creating and manipulating undirected anddirected graphs. It includes implementations for classic graph theory problemslike minimum spanning trees and network flow, and also implements algorithmsfor some recent network analysis methods, like community structure search.

http://igraph.sourceforge.net/download.html

Cytoscape

Cytoscape is an open sourcesoftware platform for visualizing complex networks and integrating these withany type of attribute data. A lot of Apps are available for various kinds ofproblem domains, including bioinformatics, social network analysis, andsemantic web.

http://www.cytoscape.org/download.html

 

其他 (Other code)

http://code.google.com/p/community-detection/  C++

http://code.google.com/p/linloglayout/  java的。

来自 <http://blog.sina.com.cn/s/blog_67532f7c0100qakz.html>

http://blog.sciencenet.cn/blog-404069-297233.html工具

MatlabBGL is a Matlabpackage for working with graphs.  It uses the Boost Graph Library to efficiently implement the graph algorithms. MatlabBGL is designed to work with large sparse graphs with hundreds of thousandsof nodes.

来自 <https://www.cs.purdue.edu/homes/dgleich/packages/matlab_bgl/>

 

数据集 (Datasets)

http://www.cs.cmu.edu/~enron/

http://www.informatik.uni-trier.de/~ley/db/

http://socialnetworks.mpi-sws.org/data-imc2007.html

http://www.cs.bris.ac.uk/~steve/networks/

 http://www.cs.bris.ac.uk/~steve/networks/peacockpaper/

http://cran.r-project.org/web/packages/timeordered/index.html

http://www.facebook.com/press/info.php?statistics

http://www.cs.cornell.edu/projects/kddcup/datasets.html

http://www-personal.umich.edu/~mejn/netdata/

http://www.cise.ufl.edu/research/sparse/mat/Pajek/

http://arnetminer.org/download

http://yeast-complexes.russelllab.org/complexview.pl?rm=complex_list

http://thebiogrid.org/

http://mips.helmholtz-muenchen.de/genre/proj/yeast/

http://www.yeastgenome.org/

http://vlado.fmf.uni-lj.si/pub/networks/data/

http://archive.routeviews.org/

http://blog.sciencenet.cn/blog-40109-279160.html

 http://deim.urv.cat/~aarenas/data/welcome.htm

相关公开课 (Open Course)

https://www.coursera.org/course/sna

This course will use social network analysis, both its theory andcomputational tools, to make sense of the social and information networks thathave been fueled and rendered accessible by the internet.

http://cm.dce.harvard.edu/2014/01/14328/publicationListing.shtml


你可能感兴趣的:(数据挖掘,community,clustering,detection)