在社交网络数据分析领域,igraph算法包提供了多种应用接口,且支持C、python、R三种语言,通常大家使用python 或 R 的机会更多些。近期出于业务需要,希望提高关系数据计算的效率,我们探索了一下C library的使用方法,以随机游走这个社区挖掘算法为例,进行了尝试,特记录一下,希望能对相关领域的同事有所借鉴。
1. 建网络,可以直接调用 igraph_read_graph_ncol 从文件中读取边,而不用一个一个节点去添加
2. 网络属性通过 igraph_cattribute_list 接口存储到不同的向量中,这点尤其会让新人迷惑,因为程序中每个节点是用从0到N-1的数值来表示的,怎么对应到业务上的节点ID,就通过这个来实现,而不用自己新手去建映射关系
3. 随机游走算法通过 igraph_community_walktrap 实现,当然还有很多其它社区挖掘的算法
4. igraph中的数据结构既要初始化,最后也要销毁,否则会报 segmentation fault 错误
#include <stdio.h>
#include <stdlib.h>
#include </usr/local/igraph-0.7.1/include/igraph.h>
#include <string.h>
#define WALKTRAP_STEPS 0x4
int main(int argc,char *argv[])
{
printf("Hello world!\n");
FILE *edgeListFile;
igraph_t wbNetwork;
igraph_matrix_t merges;
igraph_vector_t modularity;
igraph_vector_t membership;
long int i;
long int no_of_nodes;
long int no_of_edges;
int rstCode;
igraph_vector_t gtypes, vtypes, etypes;
igraph_strvector_t gnames, vnames, enames;
/* turn on attribute handling */
igraph_i_set_attribute_table(&igraph_cattribute_table);
//初始化对象
igraph_vector_init(&modularity,0);
igraph_vector_init(&membership,0);
igraph_matrix_init(&merges,0,0);
if(argc < 2){
printf("Usage: %s <inputRelationFile> \n", argv[0]);
exit(1);
}
//边存放的文件,空格分隔
edgeListFile = fopen(argv[1],"r");
//从文件中读入图
igraph_read_graph_ncol(&wbNetwork,
edgeListFile,
0, /*预定义的节点名称*/
1, /*读入节点名称*/
IGRAPH_ADD_WEIGHTS_NO, /*是否将边的权重也读入*/
0 /*0表示无向图*/
);
fclose(edgeListFile);
igraph_simplify(&wbNetwork, 1, 1, 0);
igraph_vector_init(>ypes, 0);
igraph_vector_init(&vtypes, 0);
igraph_vector_init(&etypes, 0);
igraph_strvector_init(&gnames, 0);
igraph_strvector_init(&vnames, 0);
igraph_strvector_init(&enames, 0);
igraph_cattribute_list(&wbNetwork, &gnames, >ypes, &vnames, &vtypes,
&enames, &etypes);
no_of_nodes = igraph_vcount(&wbNetwork);
no_of_edges = igraph_ecount(&wbNetwork);
printf("Graph node numbers: %d \n",no_of_nodes);
printf("Graph edge numbers: %d \n",no_of_edges);
if(igraph_cattribute_has_attr(&wbNetwork,IGRAPH_ATTRIBUTE_VERTEX,"name")){
printf("Vertex names: ");
for(i=0; i<no_of_nodes; i++) {
printf("Vertex ID: %d -> Vertex Name: %s \n",i,igraph_cattribute_VAS(&wbNetwork,"name",i));
}
}else{
printf("The Graph does not have attribute of name \n");
}
//用随机游走算法对网络进行社区结构划分
rstCode = igraph_community_walktrap(&wbNetwork,
/*edge weights*/ 0,
WALKTRAP_STEPS, /*随机游走的步数*/
&merges,
&modularity,
&membership /*每个节点从属的community编号*/);
if(rstCode != 0){
printf("Error dealing with finding communities");
return 1;
}else{
printf("Finding communities success! \n");
}
//打印出模块度演变的过程
printf("Merges:\n");
for (i=0; i<igraph_matrix_nrow(&merges); i++) {
printf("%2.1li + %2.li -> %2.li (modularity %4.2f)\n",
(long int)MATRIX(merges, i, 0),
(long int)MATRIX(merges, i, 1),
no_of_nodes+i,
VECTOR(modularity)[i]);
}
for(i=0;i<igraph_vector_size(&membership);i++){
printf("节点: %d -> 社区:%g \n",i,VECTOR(membership)[i]);
}
igraph_vector_destroy(&modularity);
igraph_vector_destroy(&membership);
igraph_matrix_destroy(&merges);
igraph_vector_destroy(>ypes);
igraph_vector_destroy(&vtypes);
igraph_vector_destroy(&etypes);
igraph_strvector_destroy(&gnames);
igraph_strvector_destroy(&vnames);
igraph_strvector_destroy(&enames);
igraph_destroy(&wbNetwork);
return 0;
}
经过上述 igraph_community_walktrap 的计算之后,会给网络中每个节点分配一个社区编号,如
注意这里的节点ID就是计算过程,算法给每个节点重新分配的ID(从0到N-1,N表示节点个数),那么这个ID要转换回我们原始关系对中的节点表示,这里的对应关系直接从算法自身保存的属性特征中提取即可,从上文摘出如下:
if(igraph_cattribute_has_attr(&wbNetwork,IGRAPH_ATTRIBUTE_VERTEX,"name")){
printf("Vertex names: ");
for(i=0; i<no_of_nodes; i++) {
printf("Vertex ID: %d -> Vertex Name: %s \n",i,igraph_cattribute_VAS(&wbNetwork,"name",i));
}
}else{
printf("The Graph does not have attribute of name \n");
}
成文仓促,如有错误不足之处,望各位道友不吝批评指正。
此外,我们长期致力于通过社交网络挖掘的技术解决信息安全领域的业务问题,如有志同道合者,欢迎多多交流。