let G ( V , E , ω ) G(V,E,\omega) G(V,E,ω) be a weighted graph, V V V is the set of nodes, E E E is the edge between nodes in V V V, ω \omega ω is the function ω : E → R n \omega:E \to \mathbb{R}^n ω:E→Rn, that measures the simularity between pairs of items(a higher value means more similar).
p i j = ω ( i , j ) d i p_{ij} = \frac{\omega(i,j)}{d_i} pij=diω(i,j)
d i = ∑ k = 1 n ω ( i , k ) d_i = \sum_{k=1}^n\omega(i,k) di=k=1∑nω(i,k)
M G ∈ R n × n M^G \in \mathbb{R}^{n \times n} MG∈Rn×n is the associated transition matrix,
M i j G = { p i j ⟨ i , j ⟩ ∈ E 0 otherwise M^G_{ij} = \begin{cases} p_{ij} & \langle i,j \rangle \in E \\ 0 & \textrm{otherwise} \end{cases} MijG={pij0⟨i,j⟩∈Eotherwise
Question
:
- ω \omega ω表示节点之间的相似性,实际上我们只有无向图,表示节点之间是否有连接,怎么通过已有的信息构建 ω \omega ω
answer: 这里的相似度可以认为是节点之间边的权值,所以 M i j G M^G_{ij} MijG可以认为是认为是以邻接矩阵操作后的数据。
这里的内容比较坑,我在论文中一直找不到关于 P visit k ( i ) P^{k}_{\textrm{visit}}(i) Pvisitk(i)是怎么计算的,在这里卡了好久好久。
在原文中的描述是这样的:
Now, denote by P v i s i t k ( i ) ∈ R n P^k_{visit}(i) \in \mathbb{R}^n Pvisitk(i)∈Rn the vector whose j-th component is the probability that a random walk originating at i will visit node j in its k-th step. Thus, P v i s i t k ( i ) P^k_{visit}(i) Pvisitk(i) is the i-th row in the matrix ( M G ) k (M^G)^k (MG)k, the k’th power of M G M^G MG.
现在我们知道 M G M^G MG是怎样计算的,但是 ( M G ) k (M^G)^k (MG)k呢,在原文中的描述是’'the k’th power of M G M^G MG", 我理解的应该是原有矩阵 M G M^G MG的k次方(矩阵的乘法)。
P v i s i t k ( i ) P^k_{visit}(i) Pvisitk(i) is the i-th row in the matrix ( M G ) k (M^G)^k (MG)k,
P v i s i t k ( i ) = ( M G ) i k P^k_{visit}(i) = (M^G)^k_i Pvisitk(i)=(MG)ik
( M G ) k = { P v i s i t k ( 1 ) T , P v i s i t k ( 2 ) T , … , P v i s i t k ( n ) T } (M^G)^k=\{P^k_{visit}(1)^{\mathbf{T}}, P^k_{visit}(2)^{\mathbf{T}}, \dots, P^k_{visit}(n)^{\mathbf{T}}\} (MG)k={Pvisitk(1)T,Pvisitk(2)T,…,Pvisitk(n)T}
Notice: 其实到这里,和马尔可夫聚类算法(MCL)是一样的。MCL是不断迭代,知道矩阵不再改变,这里作者考虑到计算复杂,采用前k次计算结果的和来作为替代。
We now offer two methods for performing the edge separation, both based on deterministic analysis of random walks.
边缘分离,锐化
NS: Separation by neighborhood similarity.
CE: Separation by circular escape.
the weighted neighborhood : 加权领域
bipartite subgraph
P visit ≤ k ( v ) = ∑ i = 1 k P visit i ( v ) P^{\leq k}_{\textrm{visit}}(v) = \sum_{i=1}^kP^{i}_{\textrm{visit}}(v) Pvisit≤k(v)=i=1∑kPvisiti(v)
Now, in order to estimate the closeness of the two node v v v and u u u , we fix some small k(eg. k = 3) and compare P visit ≤ k ( v ) P^{\leq k}_{\textrm{visit}}(v) Pvisit≤k(v) and P visit ≤ k ( u ) P^{\leq k}_{\textrm{visit}}(u) Pvisit≤k(u). The smaller the difference, the greater the intimacy between u u u and v v v.
N S ( G ) = d f n G s ( V , E , ω s ) NS(G) \xlongequal{dfn} G_s(V, E, \omega_s) NS(G)dfnGs(V,E,ωs),
where ∀ ⟨ v , u ⟩ ∈ E , ω s ( u , v ) = s i m k ( P v i s i t ≤ k ( v ) , P v i s i t ≤ k ( u ) ) \forall \langle v, u \rangle \in E, \omega_s(u, v) = sim^k(P^{\leq k}_{visit}(v),P^{\leq k}_{visit}(u)) ∀⟨v,u⟩∈E,ωs(u,v)=simk(Pvisit≤k(v),Pvisit≤k(u))
s i m k ( x , y ) sim^k(x,y) simk(x,y) is some similarity measure of the vectors x \mathrm{x} x and y \mathrm{y} y, whose value increases as x \mathrm{x} x and y \mathrm{y} y are more similar.
s i m k ( x , y ) sim^k(x,y) simk(x,y) the suitable choose:
f k ( x , y ) = d f n exp ( 2 k − ∥ x − y ∥ L 1 ) − 1 (1) f^k(x,y) \xlongequal{dfn} \exp(2k − \|x − y\|_{L_1}) − 1 \tag{1} fk(x,y)dfnexp(2k−∥x−y∥L1)−1(1)
∥ x − y ∥ L 1 = ∑ i = 1 n ∣ x i − y i ∣ \|x − y\|_{L_1} = \sum_{i=1}^n|x_i-y_i| ∥x−y∥L1=i=1∑n∣xi−yi∣
another choose is:
cos ( x , y ) = ( x , y ) ( x , x ) . ( y , y ) (2) \cos(x,y)= \frac{(x,y)}{\sqrt{(x,x)}.\sqrt{(y,y)}} \tag{2} cos(x,y)=(x,x).(y,y)(x,y)(2)
where (·,·) denotes inner-product.(内积)
import numpy as np
def markovCluster(adjacencyMat, dimension, numIter, power=2, inflation=2):
columnSum = np.sum(adjacencyMat, axis=0)
probabilityMat = adjacencyMat / columnSum
# Expand by taking the e^th power of the matrix.
def _expand(probabilityMat, power):
expandMat = probabilityMat
for i in range(power - 1):
expandMat = np.dot(expandMat, probabilityMat)
return expandMat
expandMat = _expand(probabilityMat, power)
# Inflate by taking inflation of the resulting
# matrix with parameter inflation.
def _inflate(expandMat, inflation):
powerMat = expandMat
for i in range(inflation - 1):
powerMat = powerMat * expandMat
inflateColumnSum = np.sum(powerMat, axis=0)
inflateMat = powerMat / inflateColumnSum
return inflateMat
inflateMat = _inflate(expandMat, inflation)
for i in range(numIter):
expand = _expand(inflateMat, power)
inflateMat = _inflate(expand, inflation)
print(inflateMat)
print(np.zeros((7, 7)) != inflateMat)
if __name__ == "__main__":
dimension = 4
numIter = 10
adjacencyMat = np.array([[1, 1, 1, 1],
[1, 1, 0, 1],
[1, 0, 1, 0],
[1, 1, 0, 1]])
# adjacencyMat = np.array([[1, 1, 1, 1, 0, 0, 0],
# [1, 1, 1, 1, 1, 0, 0],
# [1, 1, 1, 1, 0, 0, 0],
# [1, 1, 1, 1, 0, 0, 0],
# [0, 1, 0, 0, 1, 1, 1],
# [0, 0, 0, 0, 1, 1, 1],
# [0, 0, 0, 0, 1, 1, 1],
# ])
markovCluster(adjacencyMat, dimension, numIter)
[[1.00000000e+000 1.00000000e+000 1.00000000e+000 1.00000000e+000]
[5.23869755e-218 5.23869755e-218 5.23869755e-218 5.23869755e-218]
[0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000]
[5.23869755e-218 5.23869755e-218 5.23869755e-218 5.23869755e-218]]
[[ True True True True]
[ True True True True]
[False False False False]
[ True True True True]]
可以从中得到聚类效果 { { 1 , 2 , 4 } , { 3 } } \{\{1,2,4\},\{3\}\} {{1,2,4},{3}}
谱聚类
MCL
MCL GitHub