Soical Network Analysis 是密歇根大学Lada Adamic教授在Coursera上开的一门课。主要介绍了Social Network中的一些基本概念,比如Centrality,Betweeness,Modularity等等,如何使用工具(Gephi,NetLogo等等)去分析Social Network,如何检测Social Network中的Community,不需要任何专业基础。对于程序员出身的同学,课程中也提到了如何用R去分析Social Network。
第一周
这周只是讲了一些 Social Network Graph的基本概念:
Node
Edge(directed edge, undirected edge)
Indegree
Outdegree
这些概念比较简单,不再重复了。
Social Network Graph的表示主要有三种方式:
1. Adjacency matrix
2. Edge list
3. Adjacency list
对下面这张简单的图,三种表示方式分别为:
1. Adjacency matrix
2. Edge list
2,3
2,4
3,2
3,4
4,5
5,1
5,2
3. Adjacency list
1:
2: 3 4
3: 2 4
4: 5
5: 1 2
接下来是最重要的一个概念了:strongly connected component, weakly connected component, giant component
Strongly connected component: Each node within the component can be reached from every other node in the component by following directed links.
任意两个节点必须相互能到达
Weakly connected component: Each node within the component can be reached from every other node by following either direction.
任意两个节点只需要有一个方向能到达就可以
Giant component:If the largest connect component encompasses a significant fraction of the graph, it is called the giant component.
到底多大比例算significant呢?maybe 5%, maybe 10%……
第二周
这周主要讲了两个随机网络模型,一个是Erdos-Renyi模型,另一个是 Barabasi-Albert模型。这周的概念涉及: 连通组件(强连通,弱连通),最大连通分支,平均最短距离,直径,广度优先搜索
第五周
Question 1
Download Lada's Facebook network. Load it in Gephi (as undirected). Note the number of nodes and edges present. Calculate the clustering coefficient and average shortest path (this is OK to take as-is even though the network is not connected). Next close the project, and after you have blank slate, generate an Erdos Renyi random graph (File > Generate > Random graph...) with the same number of nodes and edges (you'll have to figure out the corresponding wiring probability to achieve this). It will produce a directed network. Calculate the clustering coefficient and average shortest path for this random network, making sure to treat the network as undirected. Which of the following observations is true?
the random graph has fewer connected components than Lada's actual network
the average shortest path is longer than 3 hops
the average clustering coefficient in Lada's egonetwork is lower than 0.4
the topology of Lada's network does not satisfy the definition of a small world network
Here is the result:
|
Lada's Facebook network |
Erdros Renyi Random Graph(wiring probablity=0.049) |
Nodes |
388 |
388 |
Edges |
3598 |
3620 |
Network Diameter |
8 |
4 |
Connect Components |
20 |
1 |
Average Clustering Coefficient |
0.534 |
0.049 |
Average Shortest Path |
2.781 |
2.34 |
Question 2
Download the a snapshot of the Gnutella peer-to-peer filesharing network (now over a decade old). Go through the same procedure: Load it in Gephi (as undirected). Note the number of nodes and edges present. Calculate the clustering coefficient and average shortest path (this is OK to take as-is even though the network is not connected). Next close the project, and after you have blank slate, generate an Erdos Renyi random graph with the same number of nodes and edges (you'll have to figure out the corresponding wiring probability to achieve this). Calculate the clustering coefficient and average shortest path for this random network, making sure to treat the network as undirected. Which of the following observations is true?
The clustering coefficient in the gnutella network is lower than that of the equivalent random graph.
The average shortest path of the gnutella network is shorter than that of the equivalent random graph.
The average shortest path of the gnutella network is longer than that of the equivalent random graph.
The gnutella network is well modeled by the Watts-Strogatz small world model.
Here is the result:
|
Gnutella peer-to-peer filesharing network |
Erdros Renyi Random Graph(wiring probablity=0.0027) |
Nodes |
795 |
795 |
Edges |
852 |
842 |
Network Diameter |
4 |
22 |
Connect Components |
204 |
111 |
Average Clustering Coefficient |
0.029 |
0.002 |
Average Shortest Path |
4.82 |
8.472 |
Question 3
Now consider Lada's clustering coefficient (the clustering coefficient of her node, where she is connected to each of her friends). What is it, approximately?
Question 4
Run a rewiring algorithm on a ring lattice. Do not apply the layout algorithm until the very end, because the position of the nodes on the circle is needed to compute the cost of wire. Read the description of the model, and experiment with different wire costs as well as trying to find the optimum (minimum energy) configuration for each wire cost setting. The model works best if you give it a long time to run. The 'find optimum' button will automatically first increase and then decrease the temperature to try to find a global optimal solution. You may want to follow this up with just a long period of 'rewire' to continue the process. Also, try increasing the speed with the slider to have the search occur faster.
Now for the question: As the cost of wire increases which of the following is true?
the average shortest path decreases
edges become more localized
hubs become more prevalent
clustering decreases
Question 5
Skim through the paper by Liben-Nowell et al. on geographic routing in social networks inferred from LiveJournal. Based on the paper, which of the following is true:
Applying a simple greedy strategy (pass the massage to your friend who is closest to the target geographically) would result in extremely long path lengths.
What matters in navigation is that the probability of being acquainted with individual X depends on how many others live closer to you than X.
Empirically, the probability that two individuals are acquainted falls off as 1/(distance)^2 based on LiveJournal data.
Population density has no bearing on the probability that two people living within a certain distance know one another.
第六周
Question 1
Using the NetLogo simulation of a diffusion process on a small world topology, answer the following. If the infection rate is 0.25, and the recovery rate is 0.30, what is true of the diffusion processes in the regular lattice vs. rewired case:
Your Answer |
|
Score |
Explanation |
shortcuts allow all nodes to be simultaneously in the infected state |
|
|
|
a pure lattice topology (no shortcuts) allows the infection to become established more quickly |
|
|
|
shortcuts prolong the amount of time an established infection can stay in the network |
Correct |
2.00 |
you should be able to see the infection persisting pretty much indefinitely in the network |
Total |
|
2.00 / 2.00 |
|
Question Explanation
Look for the number of infected individuals in the long run. You may need to reinfect the network a few times to start to see the difference in behavior.
Question 2
Using the NetLogo model of graph coloring on a network grown randomly or preferentially, answer the following (setting m = 1). What is true about a time it takes for the network to find a solution:
Your Answer |
|
Score |
Explanation |
the average time to solution is unaffected by whether the growth is random or preferential |
|
|
|
preferential attachment generates a topology that is solved more quickly than one generated with random attachment |
|
|
|
preferential attachment generates a topology that is solved more slowly than one generated with random attachment. |
Correct |
2.00 |
|
Total |
|
2.00 / 2.00 |
|
Question Explanation
If you're not sure about the answer here (though VARY-BA-TOPOLOGY should allow you to answer this), check Kearns et al. paper (listed in the syllabus) where they used human subjects to run this experiment and generally get similar results.
Question 3
Use the NetLogo cascade model with the 19_4 network (setup19_4 button, a = 3, b = 2, bilingual = off) and allocate opinion at random (or you can manually set the nodes' opinions using the select-blue and select-red buttons and clicking on individual nodes). Then allow the nodes to update their opinions until everyone is settled into their opinions. With these payoffs, how many distinct communities do you observe that can have a separate opinion from neighboring communities?
Your Answer |
|
Score |
Explanation |
3 |
Correct |
2.00 |
there are 3 distinct communities, as show here |
1 |
|
|
|
4 |
|
|
|
2 |
|
|
|
Total |
|
2.00 / 2.00 |
|
Question Explanation
With these payoffs you should be able to observe different opinions persisting in different parts of the network. Try setting individual nodes or re-randomizing to see a range of behaviors depending on the initial allocation of choices.
Question 4
Using the same NetLogo cascade model with the 19_4 network (setup19_4 button, b = 2, bilingual = off), how high does a, the payoff for choosing blue at the same time as a friend, need to be such that the whole network will adopt blue every time if at least one node adopts blue?
Your Answer |
|
Score |
Explanation |
7 |
Correct |
2.00 |
|
5 |
|
|
|
1 |
|
|
|
3 |
|
|
|
Total |
|
2.00 / 2.00 |
|
Question Explanation
To figure this one out, you may want to alloc-opinion with init-prob-blue 0. That way the whole network will have initially chosen red. Then use select-blue to set just a single node's opinion to blue.
Question 5
Use the NetLogo model of innovation on a network to answer the following. Relative to the average maximum solution achieved on a randomly grown topology, a network grown with preferential attachment will
Your Answer |
|
Score |
Explanation |
at least 10% lower max final fitness and 50% faster convergence to solution |
|
|
|
at least 10% higher max final fitness and 50% faster convergence to solution |
|
|
|
have roughly the same time to solution and max fitness |
Correct |
2.00 |
if you run the model several times you should see similar convergence times (maybe the preferential attachment model is ever slightly faster, but not by 50%) and similar max-fitness achieved. |
Total |
|
2.00 / 2.00 |
|
Question Explanation
Run the model repeatedly at the two extremes (prob-pref = 0 and prob-pref = 1) and note how quickly the model converges and what is shown under agent-max.