python聚类系数_NetworkX 计算聚类系数的Python实现

关于聚类系数的原汁原味的介绍,可以参考小世界网络这篇论文 [1]:

The clustering coefficient C(p) is defined as follows. Suppose that a vertex v has kv neighbours; then at most kvðkv 21Þ=2 edges can exist between them (this occurs when every neighbourof v is connected to everyother neighbour of v). Let Cv denote the fraction of these allowable edges that actually exist.

Define C as the average of Cv over all v. For friendship networks, these statistics have intuitive meanings: L is the average number of friendships in the shortest chain connecting two people;

Cv reflects the extent to which friends of v are also friends of each other; and thus C measures the cliquishness of a typical

friendship circle. The data shown in the figure are averages over 20 random realizations of the rewiring process described in Fig.1, and have been normalized by the values L(0), C(0) for a regular lattice. All the graphs have n ¼ 1;000 vertices and an average degree of k ¼ 10 edges per vertex. We note that a logarithmic horizontal scale has been used to resolve the rapid drop in L(p), corresponding to the onset of the small-world phenomenon.

During this drop, C(p) remains almost constant at its value for the regular lattice, indicating that the transition to a small

world is almost undetectable at the local level .

用白话解释一下:

聚类系数的定义:网路中所有长度为2的路径中闭合路径所占的比例 [2]。

节点的聚类系数是度量某节点的两个邻居节点也互为邻居的平均概率 [2]。也就是以该节点为中心,其周围的邻居中相互之间存在边的个数 除以 该节点周围邻居节点对总数。分子要求必须有连边,分母不要求,也就是 该节点的 度*(度-1)度*(度-1)。

关于节点 聚类系数 [2, 3] 的定义:

\[C_i = \frac{节点 i 的邻居中直接相连的节点对的个数}{节点 i 的邻居节点对的总数}\]

其Python版计算方式如下:

方法1:

def cal_ci(G, u):

'''

@description: 计算节点的聚类系数

@param : G, u

@return:

'''

return sum([1 for x in G[u] for y in G[x] if y in G[u]]) / (G.degree[u] * (G.degree[u] - 1)) if G.degree[u] > 1 else 0

方法2:

def get_ci(G, z):

''' 获取节点聚集系数 Clustering Coefficient'''

return sum([1 for u in G[z] for v in G[z] if not u == v and (u, v) in G.edges()]) / (len(G[z]) * (len(G[z]) - 1)) if len(G[z]) != 0 and len(G[z]) != 1 else 0

注:这里可能会存在疑问,用 (u, v) in G.edges() 会不会出错?实际上 G.edges() 打印出来虽然是列表的形式,但其实是NetworkX的内置类型,可以通过以下代码验证:

G = nx.path_graph(5)

print(G.edges())

print(type(G.edges()))

a = (3, 2) in G.edges()

b = (3, 2) in [(0, 1), (1, 2), (2, 3), (3, 4)]

print(a)

print(b)

运行结果:

[(0, 1), (1, 2), (2, 3), (3, 4)]

True

False

方法3:NetworkX库中的方法:

nx.clustering(G,u)

对应源代码:

def clustering(G, nodes=None, weight=None):

r"""Compute the clustering coefficient for nodes.

For unweighted graphs, the clustering of a node :math:`u`

is the fraction of possible triangles through that node that exist,

.. math::

c_u = \frac{2 T(u)}{deg(u)(deg(u)-1)},

where :math:`T(u)` is the number of triangles through node :math:`u` and

:math:`deg(u)` is the degree of :math:`u`.

For weighted graphs, there are several ways to define clustering [1]_.

the one used here is defined

as the geometric average of the subgraph edge weights [2]_,

.. math::

c_u = \frac{1}{deg(u)(deg(u)-1))}

\sum_{vw} (\hat{w}_{uv} \hat{w}_{uw} \hat{w}_{vw})^{1/3}.

The edge weights :math:`\hat{w}_{uv}` are normalized by the maximum weight

in the network :math:`\hat{w}_{uv} = w_{uv}/\max(w)`.

The value of :math:`c_u` is assigned to 0 if :math:`deg(u) < 2`.

For directed graphs, the clustering is similarly defined as the fraction

of all possible directed triangles or geometric average of the subgraph

edge weights for unweighted and weighted directed graph respectively [3]_.

.. math::

c_u = \frac{1}{deg^{tot}(u)(deg^{tot}(u)-1) - 2deg^{\leftrightarrow}(u)}

T(u),

where :math:`T(u)` is the number of directed triangles through node

:math:`u`, :math:`deg^{tot}(u)` is the sum of in degree and out degree of

:math:`u` and :math:`deg^{\leftrightarrow}(u)` is the reciprocal degree of

:math:`u`.

Parameters

----------

G : graph

nodes : container of nodes, optional (default=all nodes in G)

Compute clustering for nodes in this container.

weight : string or None, optional (default=None)

The edge attribute that holds the numerical value used as a weight.

If None, then each edge has weight 1.

Returns

-------

out : float, or dictionary

Clustering coefficient at specified nodes

Examples

--------

>>> G=nx.complete_graph(5)

>>> print(nx.clustering(G,0))

1.0

>>> print(nx.clustering(G))

{0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0}

Notes

-----

Self loops are ignored.

References

----------

.. [1] Generalizations of the clustering coefficient to weighted

complex networks by J. Saramäki, M. Kivelä, J.-P. Onnela,

K. Kaski, and J. Kertész, Physical Review E, 75 027105 (2007).

http://jponnela.com/web_documents/a9.pdf

.. [2] Intensity and coherence of motifs in weighted complex

networks by J. P. Onnela, J. Saramäki, J. Kertész, and K. Kaski,

Physical Review E, 71(6), 065103 (2005).

.. [3] Clustering in complex directed networks by G. Fagiolo,

Physical Review E, 76(2), 026107 (2007).

"""

if G.is_directed():

if weight is not None:

td_iter = _directed_weighted_triangles_and_degree_iter(

G, nodes, weight)

clusterc = {v: 0 if t == 0 else t / ((dt * (dt - 1) - 2 * db) * 2)

for v, dt, db, t in td_iter}

else:

td_iter = _directed_triangles_and_degree_iter(G, nodes)

clusterc = {v: 0 if t == 0 else t / ((dt * (dt - 1) - 2 * db) * 2)

for v, dt, db, t in td_iter}

else:

if weight is not None:

td_iter = _weighted_triangles_and_degree_iter(G, nodes, weight)

clusterc = {v: 0 if t == 0 else t / (d * (d - 1)) for

v, d, t in td_iter}

else:

td_iter = _triangles_and_degree_iter(G, nodes)

clusterc = {v: 0 if t == 0 else t / (d * (d - 1)) for

v, d, t, _ in td_iter}

if nodes in G:

# Return the value of the sole entry in the dictionary.

return clusterc[nodes]

return clusterc

虽然,求网络的聚类系数定义为网络中存在的三角形个数 * 3 再除以 三元组个数,但是实际中大多采用所有节点聚类系数的平均值来计算。

参考文献:

D. J. Watts and S. H. Strogatz, “Collective dynamics of ‘small-world’ networks,” Nature, vol. 393, no. 6684, pp. 440–442, Jun. 1998.

M.E.J.Newman. 网络科学引论[M]. 2014.

汪小帆, 李翔, 陈关荣. 网络科学导论[J]. 高等教育出版社, 2012.

你可能感兴趣的:(python聚类系数)