CommunityGAN Community Detection with Generative Adversarial Nets

CommunityGAN Community Detection with Generative Adversarial Nets

这篇文章主要的目标是在建模一个关于图节点的表征。在这个表征中,希望能够把节点的社区性质表达出来。

这样的话,社区的区分之间就可以有重叠。

Introduction

社区检测问题:识别和分析共享高度相似属性或函数的顶点组

传统的问题假设是一个点只属于一个社区。如果把这个限制放开,重叠的社区检测问题是有很大研究价值的。

现在关于图的一些问题(比如说链接预测、推荐、节点分类等)是通过对节点进行表征,即先对整个图进行建模,之后的第二个模型再进行进一步的预测。

然而,由于社区的密集重叠,这种嵌入在重叠的社区检测问题中的应用仍然存在许多局限性。

Generally, the useful information of vertex embedding vectors is the relevant distance of these vectors, while the specific value in vertex embedding vectors has no meanings.一般来说,顶点嵌入向量的有用信息是这些向量的相关距离,而顶点嵌入向量中的特定值没有意义。(比如说node2vec,LINE等等)

如果要给一些顶点特定的向量值,我们可能要考虑用一些其他的方法(比如说逻辑回归等等)。关于检测社区,一般考虑一些邻近算法,NN、k-NN。我们试图在一个统一的框架中同时进行社区检测和网络嵌入,并且去解决社区重叠的问题(解决密集的重叠问题to solve the dense overlapping problem)。

1558151909997

关注motif,也就是图的结构(比如说环)

Thus in this paper, unlike most previous works considering relationship between only two vertices (the relationship between a center vertex and one of the other vertices in a window), we try to generate and discriminate motifs.在本文中,不像以往的大多数工作只考虑两个顶点之间的关系(一个中心顶点和窗口中的另一个顶点之间的关系),我们试图生成和区分motifs。

RELATEDWORK

Community Detection

多种不同角度的社区检测算法:

  • One direction is to design some measure of the quality of a community like modularity, and community structure can be uncovered by optimizing such measures.一个方向是设计一些度量社区质量的方法,比如模块化,通过优化这些度量方法可以发现社区结构。

  • Another direction is to adopt the generative models to describe the generation of the graphs, and the communities can be inferred by fitting graphs to such models.另一个方向是采用生成模型来描述图的生成,通过拟合图来推断群落。

  • Moreover, some models focus on the graph adjacency matrix and output the relationship between vertices and communities by adopting matrix factorization algorithms on the graph adjacency matrix.此外,一些模型侧重于图的邻接矩阵,通过对图的邻接矩阵采用矩阵因子分解算法,输出顶点与社区之间的关系。

  • These models often consider the dense community overlapping problem and detect overlapping communities.这些模型经常考虑密集的社区重叠问题,并检测出重叠的社区。

    However, the performance of these methods are restricted by performing pair reconstruction with bi-linear models.然而,由于采用双线性模型进行对重建,限制了这些方法的性能。

Graph representation learning图表示学习

对于一般图的表征学习:

  • DeepWalk:shows the random walk in a graph is similar to the text sequence in natural language.显示图中的随机游走类似于自然语言中的文本序列。
  • Node2vec:extends the idea of DeepWalk by proposing a biased random walk algorithm, which provides more flexibility when generating the sampled vertex sequence.进一步扩展了深度步的思想,提出了一种有偏随机步算法,在生成采样顶点序列时提供了更大的灵活性。
  • LINE:learns the vertex representation preserving both the first and second order proximities.首先学习保留一阶近似和二阶近似的顶点表示。
  • GraRep:applies different loss functions defined on graphs to capture different k-order proximities and the global structural properties of the graph.应用图上定义的不同损失函数来捕捉图的不同k阶近似性和全局结构特性。
  • GraphGAN:proposes a unified adversarial learning framework, which naturally captures structural information from graphs to learn the graph representation.提出了一种统一的对抗性学习框架,它可以自然地从图中获取结构信息来学习图的表示形式。
  • ANE:utilizes GAN as a regularizer for learning stable and robust feature extractor.利用GAN作为正则化器来学习稳定鲁棒的特征提取器。

For community detection tasks, we have to adopt other clustering algorithms on vertex embeddings, which cannot handle the dense community overlapping problem.如果用表征来解决社区检测问题,得采用其他基于顶点嵌入的聚类算法,但是这样无法处理密集的社区重叠问题。

CommunityGAN可以直接将顶点的成员关系输出到社区。

Unified framework for graph representation learning and community detection图表示学习和社区检测的统一框架

最初的想法是从矩阵、张量分解开始的。但是,这样没办法处理大规模数据。

EMPIRICAL OBSERVATION经验观察

通过分析一些真实的网络来得到一些信息。

数据:在线社交网络(online social network),协作网络(collaboration network),产品网络(product network)

Empirical Observations经验观测值

两个关键问题

  • How do the communities contribute to the generation of motifs?社区如何为图结构的产生做出贡献?
  • What is the change in motif generation with communities overlapping?随着群落的重叠,图结构产生的变化是什么?

第一个问题

How do the communities contribute to the generation of motifs?社区如何为图结构的产生做出贡献?

研究方法:

  • randomly select one community.随机选择一个社区。

  • sample 2/3/4 vertices from this community and judge whether they could compose a motif or not.从这个群落中选取2/3/4个顶点作为样本,判断它们是否可以构成一个motif。

    • 为什么是只选取2/3/4个顶点呢?

      we mostly focus on a particular kind of motifs (clique), we only demonstrate the occurrence probability of 2/3/4-vertex cliques.我们主要关注一种特殊的结构(群),我们只证明2/3/4顶点群的出现概率。

      数据表明,一个群落中顶点群的平均出现概率远高于从整个网络中随机选取的顶点群。the average occurrence probabilities of cliques for vertices in one community are much higher than that for vertices randomly selected from the whole network.

      1558175087728

      (表2:从所有顶点或从一个社区采样的顶点的团出现的概率。R:从所有顶点。C:来自一个社区。)

  • 重复很多次,得到在一个社区中某一个motif发生的概率

第二个问题

What is the change in motif generation with communities overlapping?随着群落的重叠,图结构产生的变化是什么?

有些人的研究2-clique概率和社区重叠之间的关系,表明:两个顶点共有的社区越多,它们成为2-clique的概率就越高。

我们进一步探究3-clique和4-clique:the probability curve increases in the overall trend as the number of shared communities increases.随着共享社区数量的增加,总体趋势的概率曲线呈上升趋势。(在下图中有直观的感受)

1558175440443

Such observation accords with the base assumption of AGM framework that vertices residing in communities’ overlaps are more densely connected to each other than the vertices in a single community.这种观察符合AGM框架的基本假设,即居住在社区重叠部分的顶点之间的联系比单个社区中的顶点更为紧密。

METHODOLOGY方法论

(前方数学预警!!!)

CommunityGAN Framework

本篇论文我们只关心cliques这种结构。we only focus on a particular kind of motifs: cliques

根据上面的定义, M ( v c ) M(v_c) M(vc)可以被视为从 p t r u e ( m ∣ v c ) p_{true}(m|v_c) ptrue(mvc)中的一组观察到的结构。

生成器G和判别器D在做一个minimax游戏,目标函数:
min ⁡ θ G max ⁡ θ D V ( G , D ) = ∑ c = 1 V ( E m ∼ p t r u e ( ⋅ ∣ v c ) [ log ⁡ D ( m ; θ D ) ] + E s ∼ G ( s ∣ v c ; θ G ) [ log ⁡ ( 1 − D ( s ; θ D ) ) ] ) \min_{\theta_G}\max_{\theta_D}V(G,D)=\sum_{c=1}^V(E_{m∼p_{true}(\cdot|v_c)}[\log D(m;\theta_D)]+E_{s∼G(s|v_c;\theta_G)[\log(1-D(s;\theta_D))]}) θGminθDmaxV(G,D)=c=1V(Emptrue(vc)[logD(m;θD)]+EsG(svc;θG)[log(1D(s;θD))])
CommunityGAN的框架:

1558177236986

判别器D:从 p t r u e ( ⋅ ∣ v c ) p_{true}(\cdot|v_c) ptrue(vc)中采正样例,从 G ( ⋅ ∣ v c ; θ G ) G(\cdot|v_c;\theta_G) G(vc;θG)中采负样例

CommunityGAN Optimization

∇ θ D V ( G , D ) = ∑ c = 1 V ( E m ∼ p t r u e ( ⋅ ∣ v c ) [ ∇ θ D log ⁡ D ( m ; θ D ) ] + E s ∼ G ( s ∣ v c ; θ G ) [ ∇ θ D log ⁡ ( 1 − D ( s ; θ D ) ) ] ) ∇_{θ_D}V(G,D)=\sum_{c=1}^V(E_{m∼p_{true}(·|vc)}[∇_{\theta_D}\log D(m;\theta_D)]+E_{s∼G(s|v_c;\theta_G)}[∇_{\theta_D}\log(1-D(s;\theta_D))]) θDV(G,D)=c=1V(Emptrue(vc)[θDlogD(m;θD)]+EsG(svc;θG)[θDlog(1D(s;θD))])

s是离散的:
∇ θ D V ( G , D ) = ∇ θ D ∑ c = 1 V E s ∼ G ( ⋅ ∣ v c ) [ log ⁡ ( 1 − D ( s ) ) ] = ∑ c = 1 V E s ∼ G ( ⋅ ∣ v c ) [ ∇ θ D log ⁡ G ( s ∣ v c ) log ⁡ ( 1 − D ( s ) ) ] ) ∇_{θ_D}V(G,D)=∇_{θ_D}\sum_{c=1}^VE_{s∼G(·|v_c)}[\log(1-D(s))]=\sum^V_{c=1}E_{s∼G(\cdot|v_c)}[∇_{\theta_D}\log G(s|v_c)\log(1-D(s))]) θDV(G,D)=θDc=1VEsG(vc)[log(1D(s))]=c=1VEsG(vc)[θDlogG(svc)log(1D(s))])
(上面这个式子是怎么变的没有看懂……)

A Naive Implementation of D and G

最简单的想法就是直接上各种函数(sigmod,softmax)

For the discriminator D, intuitively we can define it as the multiplication of the sigmoid function of the inner product of every two vertices in the input vertex subset s.对于判别器D,我们可以直观地将其定义为输入顶点子集s中每两个顶点的内积的sigmoid函数的乘积。
D ( s ) = ∏ ( u , v ) ∈ s , u ̸ = v σ ( d u T ⋅ d v ) D(s)=\prod_{(u,v)\in s,u\not=v}\sigma(d_u^T\cdot d_v) D(s)=(u,v)s,u̸=vσ(duTdv)
其中 d u , d v ∈ R k d_u,d_v\in\R^k du,dvRk是在判别器中节点 u u u v v v的k维向量表示。用 θ D \theta_D θD来表示判别器对于顶点所生成的空间。

对于生成器G:

to generate a vertex subset s covering vertex vc , we can regard the subset as a sequence of vertices为了生成一个覆盖顶点vc的顶点子集s,我们可以把这个子集看作一个顶点序列: ( v s 1 , v s 2 , . . . , v s m ) (v_{s_1},v_{s_2},...,v_{s_m}) (vs1,vs2,...,vsm) v s 1 = v c v_{s_1}=v_c vs1=vc。G的定义如下:
G ( s ∣ v c ) = G v ( v s 2 ∣ v s 1 ) G v ( v s 3 ∣ v s 1 , v s 2 ) . . . G v ( v s m ∣ v s 1 , . . . , v s m − 1 ) G(s|v_c)=G_v(v_{s_2}|v_{s_1})G_v(v_{s_3}|v_{s_1},v_{s_2})...G_v(v_{s_m}|v_{s_1},...,v_{s_{m-1}}) G(svc)=Gv(vs2vs1)Gv(vs3vs1,vs2)...Gv(vsmvs1,...,vsm1)
**我们是基于 v s 1 v_{s_1} vs1 v s m − 1 v_{s_{m-1}} vsm1来生成的 v m v_m vm,而不仅仅是 v s m − 1 v_{s_{m-1}} vsm1。如果我们只基于 v s m − 1 v_{s_{m-1}} vsm1来生成,那么 v s m v_{s_m} vsm很有可能和其他顶点不属于同一个社区。**举个例子, v s 1 v_{s_1} vs1 v s m − 1 v_{s_{m-1}} vsm1都是同一所大学的学生,而只基于 v s m − 1 v_{s_{m-1}} vsm1生成的 v s m v_{s_m} vsm很有可能是 v s m − 1 v_{s_{m-1}} vsm1的家长。

Simply, we can know the probability of the vertex subset s being a clique will be very low.简单地说,我们知道顶点子集 s s s是一个小集团的概率非常低。

For the implementation of the vertex generator G v G_v Gv, straightforwardly, we can define it as a softmax function over all other vertices.对于顶点生成器 G v G_v Gv的实现,我们可以直接将它定义为所有顶点上的一个softmax函数:
G v ( v s m ∣ v s 1 , v s 2 , . . . , v s m − 1 ) = exp ⁡ ( ∑ i = 1 m − 1 g v s m T g v s i ) ∑ v ̸ ∈ ( v s 1 , v s 2 , . . . , v s m − 1 ) exp ⁡ ( ∑ i = 1 m − 1 g v T g v s i ) G_v(v_{s_m}|v_{s_1},v_{s_2},...,v_{s_{m-1}})=\frac{\exp(\sum^{m-1}_{i=1}g_{v_{s_m}}^Tg_{v_{s_i}})}{\sum_{v\not\in(v_{s_1},v_{s_2},...,v_{s_{m-1}})}\exp(\sum_{i=1}^{m-1}g_v^Tg_{v_{s_i}})} Gv(vsmvs1,vs2,...,vsm1)=v̸(vs1,vs2,...,vsm1)exp(i=1m1gvTgvsi)exp(i=1m1gvsmTgvsi)
这里 g v ∈ R k g_v\in\R^k gvRk是在生成器G中关于节点 v v v的向量表征, θ G \theta_G θG是生成器G的k维空间。

Graph AGM

naive想法的局限性

(上面那种naive的想法由什么问题呢?)

Sigmoid and softmax function provide concise and intuitive definitions for the motif discrimination in Ds and vertex generation in Gv , but they have three limitations in community detection task.sigmoid和softmax函数关于在Ds和顶点生成图结构,歧视提供简洁和直观的定义。但是他们有三个局限:

  • To detect community, after learning the vertex representation vectors based on Eq. (4) and (6), we still need to adopt some clustering algorithms to detect the communities.生成器G和判别器D学习后的顶点表征向量,为了检测社区,仍需要采取一些聚类算法来检测的社区。
    • the overlap is indeed a significant feature of many real-world social networks.重叠确实是许多现实社会网络的一个重要特征。
      • The authors showed that in some real-world datasets, one vertex might belong to tens of communities simultaneously.在一些实际数据集中,一个顶点可能同时属于几十个群落。
      • general clustering algorithms cannot handle such dense overlapping.一般的聚类算法无法处理如此密集的重叠。
  • sigmoid和softmax函数计算效率低
  • The graph structure encodes rich information of proximity among vertices, but the softmax in Eq. (6) completely ignores it.图结构编码了丰富的顶点间接近性信息,但式(6)(生成器G的那个式子)中的softmax完全忽略了这一点。

AGM

基本框架

1558181829962

AGM (Affiliation Graph Model) is based on the idea that communities arise due to shared group affiliation, and views the whole network as a result generated by a community-affiliation graph model.AGM(隶属关系图模型)是基于共享的群体隶属关系而产生社区的思想,将整个网络看作是隶属关系图模型生成的结果。

The framework of AGM is illustrated in Figure 4, which can be either seen as a bipartite network between vertices and communities, or written as a nonnegative affiliation weight matrix.AGM的框架如图4(上图)所示,既可以看作是顶点与群体之间的二部网络,也可以写成非负隶属权矩阵(每个点 v v v都有一个隶属于某个社区 c c c的强度: F v c F_{vc} Fvc)。在AGM中,每个顶点可以隶属于0个、一个或多个社区。

For any community c ∈ C c\in C cC, it connects its member vertices u , v u,v u,v with probability 1 − exp ⁡ ( F u c ⋅ F v c ) 1-\exp( F_{uc}\cdot F_{vc}) 1exp(FucFvc).对于任何社区,它连接其成员顶点u,v与概率。

Moreover, each community c creates edges independently.此外,每个社区c都独立地创建边。(所以,当有两个点通过多个社区相连的时候,是可能直接相加考虑的)

If the pair of vertices u , v u,v u,v are connected multiple times through different communities, the probability is 1 − exp ⁡ ( ∑ c F u c ⋅ F v c ) 1-\exp(\sum_cF_{uc}\cdot F_{vc}) 1exp(cFucFvc).如果 u 、 v u、v uv这对顶点通过不同的社区多次连接,则概率为 1 − exp ⁡ ( ∑ c F u c ⋅ F v c ) 1-\exp(\sum_cF_{uc}\cdot F_{vc}) 1exp(cFucFvc)

So that the probability that vertices u,v are connected (through any possible communities) is p ( u , v ) = 1 − e x p ( F u ⋅ F v ) p(u,v) = 1-exp( F_u\cdot F_v ) p(u,v)=1exp(FuFv), where F u F_u Fu and F v F_v Fv are the nonnegative C-dimensional affiliation vectors for vertices u u u and v v v respectively.因此,顶点 u , v u,v u,v(通过任何可能的社团)连接的概率为 p ( u , v ) = 1 − e x p ( F u ⋅ F v ) p(u,v) = 1-exp( F_u\cdot F_v ) p(u,v)=1exp(FuFv),其中 F u F_u Fu F v F_v Fv分别为顶点 u u u v v v的非负 c c c维向量。

从边生成扩展到结构生成

For any m m m vertices v 1 v_1 v1 to v m v_m vm, we assume that the probability of them to be a clique through of community c c c is defined as p c ( v 1 , v 2 , . . . , v m ) = 1 − exp ⁡ ( − ∏ i = 1 m F v i c ) p_c (v_1,v_2, . . .,v_m)=1-\exp(-\prod_{i=1}^mF_{vi}c) pc(v1,v2,...,vm)=1exp(i=1mFvic).对于任意 m m m个顶点 v 1 v_1 v1 v m v_m vm,我们假设它们通过社区 c c c成为一个小团体的概率定义为……

Then the probability that these m m m vertices compose a clique through any possible communities can be calculated via:然后这些m个顶点通过任何可能的社区组成一个小团体的概率可以通过下面这个式子进行计算:
p ( v 1 , v 2 , . . . , v m ) = 1 − ∏ c ( 1 − p c ( v 1 , v 2 , . . . , v m ) ) = 1 − exp ⁡ ( − ⊙ ( F v 1 , F v 2 , . . . , F v m ) ) p(v_1,v_2,...,v_m)=1-\prod_c(1-p_c(v_1,v_2,...,v_m))\\=1-\exp(-⊙(F_{v_1},F_{v_2},...,F_{v_m})) p(v1,v2,...,vm)=1c(1pc(v1,v2,...,vm))=1exp((Fv1,Fv2,...,Fvm))
⊙表示从上到下的乘积的和: ⊙ ( F v 1 , F v 2 , . . . , F v m ) = ∑ c = 1 C ∏ i = 1 m F v i c ⊙(F_{v_1},F_{v_2},...,F_{v_m})=\sum_{c=1}^C\prod_{i=1}^mF_{v_ic} (Fv1,Fv2,...,Fvm)=c=1Ci=1mFvic

Then the discriminator, which was defined as the product of sigmoid in a straightforward way, can be redefined as然后将判别器(之前的naive想法是直接定义为sigmoid的乘积)重新定义为: D ( s ) = 1 − exp ⁡ ( − ⊙ ( d v 1 , d v 2 , . . . , d v m ) ) D(s)=1-\exp(-⊙(d_{v_1},d_{v_2},...,d_{v_m})) D(s)=1exp((dv1,dv2,...,dvm))

Moreover, the generator G v G_v Gv can be redefined as the softmax function over all other possible vertices to compose a clique with m − 1 m-1 m1 chosen vertices此外,生成器 G v G_v Gv可以被重新定义为所有其他可能顶点上的softmax函数,以组成一个包含 m − 1 m-1 m1选择顶点的小团体: G v ( v s m ∣ v s 1 , v s 2 , . . . , v s m − 1 ) = 1 − exp ⁡ ( − ⊙ ( g v 1 , g v 2 , . . . , g v m ) ) ∑ v ̸ ∈ ( v s 1 , v s 2 , . . . , v s m − 1 ) 1 − exp ⁡ ( − ⊙ ( g v s 1 , g v s 2 , . . . , g v s m − 1 , g v ) G_v(v_{s_m}|v_{s_1},v_{s_2},...,v_{s_{m-1}})=\frac{1-\exp(-⊙(g_{v_1},g_{v_2},...,g_{v_m}))}{\sum_{v\not\in(v_{s_1},v_{s_2},...,v_{s_{m-1}})}1-\exp(-⊙(g_{v_{s_1}},g_{v_{s_2}},...,g_{v_{s_{m-1}}},g_v)} Gv(vsmvs1,vs2,...,vsm1)=v̸(vs1,vs2,...,vsm1)1exp((gvs1,gvs2,...,gvsm1,gv)1exp((gv1,gv2,...,gvm))

With this setting, the learned vertex representation vectors g v g_v gv will represent the affiliation weight between vertex v v v and communities, which means we need no additional clustering algorithms to find the communities and the first aforementioned limitation is omitted.通过这个设置,学习到的顶点表示向量 g v g_v gv将表示顶点 v v v与社团之间的隶属权,这意味着我们不需要额外的聚类算法来寻找社团,并且省略了前面提到的第一个限制。

下一步克服其他两个限制

sigmoid和softmax函数计算效率低

we first assume there is a virtual vertex v v v_v vv which is connected to all the vertices in the union of neighbors of vertices from v s 1 v_{s_1} vs1 to v s m − 1 v_{s_{m−1}} vsm1.我们首先假设有一个虚拟顶点 v v v_v vv,它连接到从 v s 1 v_{s_1} vs1 v s m − 1 v_{s_{m-1}} vsm1 N ( v v ) = N ( v s 1 ) ∪ ⋅ ⋅ ⋅ ∪ N ( v s m − 1 ) N(v_v)=N(v_{s_1})∪···∪N(v_{s_{m−1}}) N(vv)=N(vs1)N(vsm1)

关于这个虚拟节点的向量表征是什么? g v v = g v 1 ◦ g v 2 ◦ . . . ◦ g v s m − 1 g_{v_v}=g_{v_1}◦g_{v_2}◦...◦g_{v_{s_{m-1}}} gvv=gv1gv2...gvsm1。the representation vector of v v v_v vv is the entrywise product of the representation vectors of vertices from v s 1 v_{s_1} vs1 to v s m − 1 v_{s_{m−1}} vsm1.

同时将 G v ( v s m ∣ v s 1 , v s 2 , . . . , v s m − 1 ) G_v(v_{s_m}|v_{s_1},v_{s_2},...,v_{s_{m-1}}) Gv(vsmvs1,vs2,...,vsm1)简化成了 G v ( v s m ∣ v v ) G_v(v_{s_m}|v_v) Gv(vsmvv)

然后根据一个概率函数,从 v v v_v vv进行随机游走:During the process of random walk, if the currently visited vertex is v v v and generator G G G decides to visit v v v's previous vertex, then v v v is chosen as the generated vertex and the random walk process stops.在随机游走过程中,如果当前访问的顶点为 v v v,生成器 G G G决定访问v之前的顶点,则选择v作为生成的顶点,随机游走过程停止。(下面是一个例子,顶点子集s的生成过程。蓝色箭头表示随机游走的路径。在两个蓝色顶点处,因为顶点生成器决定访问前一个顶点,所以随机游走结束,并选择蓝色顶点。)

1558186906531

(那随机游走的概率函数是怎么算的?)

Moreover, in the random walk process, we wish the walk path is always relevant to the root vertex v v v_v vv for maximizing the probability of the generated vertex subset to be a motif.此外,在随机游走过程中,为了使生成的顶点子集成为基元的概率最大化,我们希望行走路径始终与根 v v v_v vv相关。

So that for a given vertex v c v_c vc and one of its neighbors v i ∈ N ( v c ) v_i\in N(v_c) viN(vc), we define the relevance probability of v i v_i vi given v c v_c vc as.对于一个给定的顶点 v c v_c vc和它的一个邻居 v i ∈ N ( v c ) v_i\in N(v_c) viN(vc),我们定义 v i v_i vi v c v_c vc下的关联概率为:
p v v ( v i ∣ v c ) = 1 − exp ⁡ ( − ⊙ ( g v i , g v c , g v v ) ) ∑ v j ∈ N ( v c ) 1 − exp ⁡ ( − ⊙ ( g v j , g v c , g v v ) ) p_{v_v}(v_i|v_c)=\frac{1-\exp(-⊙(g_{v_i},g_{v_c},g_{v_v}))}{\sum_{v_j\in N(v_c)}1-\exp(-⊙(g_{v_j},g_{v_c},g_{v_v}))} pvv(vivc)=vjN(vc)1exp((gvj,gvc,gvv))1exp((gvi,gvc,gvv))
这个函数的本质是一个softmax除以N(vc)用顶点vv和vc组成小团体??

解决不能反映图结构的问题

(还是通过随机游走来采样解决对图建模的问题)

If we denote the path of random walk as P r = ( v r 1 , v r 2 , . . . , v r n ) Pr = (v_{r_1},v_{r_2} , . . .,v_{r_n}) Pr=(vr1,vr2,...,vrn) where v r 1 = v v v_{r_1}=v_v vr1=vv , the probability for selecting this path will be p v v ( v r n − 1 ∣ v r n ) ⋅ ∏ i = 1 n − 1 p v v ( v r i + 1 ∣ v r i ) p_{v_v}(v_{r_{n-1}}|v_{r_n})\cdot \prod_{i=1}^{n-1}p_{v_v}(v_{r_{i+1}}|v_{r_i}) pvv(vrn1vrn)i=1n1pvv(vri+1vri).如果我们用 P r = ( v r 1 , v r 2 , . . . , v r n ) Pr = (v_{r_1},v_{r_2} , . . .,v_{r_n}) Pr=(vr1,vr2,...,vrn)来表示 v r 1 = v v v_{r_1}=v_v vr1=vv的随机游走路径,那么选择这条路径的概率为。

In the policy gradient, we regard the selection of this path as an action and the target is to maximize its reward from D. Thus, although there may be multiple paths between v r 1 v_{r_1} vr1 and v r n v_{r_n} vrn , if we have selected the path P r P_r Pr , we will optimize the policy gradient on it and neglect other paths.在策略迭代中,我们认为选择这条路径作为一个行动和目标是为了最大化从D的奖励。因此,尽管会有多条路径在 v r 1 v_{r_1} vr1 v r n v_{r_n} vrn之间,如果我们选择路径 P r P_r Pr,我们将很久它来优化,同时忽视其他路径。

In other words, if we select the path P r P_r Pr , we assign G v ( v s m ∣ v v ) G_v(v_{s_m}|v_v) Gv(vsmvv) as follows.换句话说,如果我们选择路径 P r P_r Pr,我们分配 G v ( v s m ∣ v v ) = p v v ( v r n − 1 ∣ v r n ) ⋅ ∏ i = 1 n − 1 p v v ( v r i + 1 ∣ v r i ) G_v(v_{s_m}|v_v)=p_{v_v}(v_{r_{n-1}}|v_{r_n})\cdot\prod_{i=1}^{n-1}p_{v_v}(v_{r_{i+1}}|v_{r_i}) Gv(vsmvv)=pvv(vrn1vrn)i=1n1pvv(vri+1vri)

1558187385695

Other Issues

Model Initialization

两种方法:

  • deploy AGM model on the graph to learn a community affiliation vector F i F_i Fi for each vertex v i v_i vi , and then we can set g v i = d v i = F i g_{v_i}=d_{v_i}=F_i gvi=dvi=Fi directly.在图上部署AGM模型,学习关于每一个点 v i v_i vi的社区关系向量 F i F_i Fi,直接将 g v i = d v i = F i g_{v_i}=d_{v_i}=F_i gvi=dvi=Fi

  • use locally minimal neighborhoods to initialize θ G θ_G θG and θ D θ_D θD.使用局部最小邻域

    • 什么是使用局部最小邻域:
      • We can regard each vertex v i v_i vi along with its neighbors N ( v i ) N(v_i) N(vi), denoted as C ( v i ) C(v_i) C(vi), as a community.我们可以将每个顶点 v i v_i vi及其邻居 N ( v i ) N(v_i) N(vi)(记作 C ( v i C(v_i C(vi))看作一个社区。
      • Community C ( v i ) C(v_i) C(vi) is called locally minimal if C ( v i ) C(v_i) C(vi) has lower conductance than all the C ( v j ) C(v_j) C(vj) for vertices v j v_j vj who are connected to vertex v i v_i vi .对于与顶点 v i v_i vi相连的顶点 v j v_j vj,如果 C ( v i ) C(v_i) C(vi)的conductance比所有的 C ( v j ) C(v_j) C(vj)都低,则称为局部极小。
      • conductance在这里怎么理解???
    • For a node v i v_i vi who belongs to a locally minimal neighborhood c c c, we initialize F v i c = 1 F_{v_ic}=1 Fvic=1, otherwise F v i c = 0 F_{v_ic}=0 Fvic=0.对于属于局部最小邻域 c c c的节点 v i v_i vi
    • 这次方法省时,但性能低于第一种

Determining community membership确定社区成员

对于每一个点,我们想决定一个连接最紧密的点(the “hard” community membership)

通过加阈值来达到这一点。

The basic intuition is that if two nodes belong to the same community c c c, then the probability of having a link between them through community c c c should be larger than the background edge probability.基本的直觉是,如果两个节点属于同一个社区c,那么它们之间通过社区c链接的概率应该大于背景边缘概率 ϵ = 2 E V ( V − 1 ) \epsilon=\frac{2E}{V(V-1)} ϵ=V(V1)2E(这是一个常数,这个常数是怎么设定的???)

于是,我们设置这个阈值为 δ = − log ⁡ ( 1 − ϵ ) \delta=\sqrt{-\log(1-\epsilon)} δ=log(1ϵ)

当一个点的 g v i c ≥ δ g_{v_ic}\geq \delta gvicδ或者 d v i c ≥ δ d_{v_ic}\geq\delta dvicδ时,可以认为 v i v_i vi在社区 c c c里面。

Choosing the number of communities选择社区的数量

We follow the method proposed in [1] to choose the number of communities C. Specifically, we reserve 20% of links for validation and learn the model parameters with the remaining 80% of links for different C. After that, we use the learned parameters to predict the links in validation set and select the C with the maximum prediction score as the number of communities.

我们遵循[1]中提出的方法选择的社区数量C。具体地说,我们保留20%的链接进行验证和学习剩下的80%的模型参数链接不同的C之后,我们使用学习参数预测中的链接验证设置,选择C与最大预测分数社区的数量。

你可能感兴趣的:(机器学习)