之前的那篇文章《网络挖掘初探索(2)_NEO4J图可视化》让数据用过图的方式展示,图可视化给人以视觉冲突,给能清晰查看关系信息。
当然图除了可视化这个明显优势,还有很多其他隐藏优势,让我们一一来揭秘吧。
图数据库NEO4J提供了专业的分析算法
常用算法:
中心度算法(Centralities):
(1)PageRank (页面排名,algo.pageRank)
(2)ArticleRank(文档排名,algo.articleRank)
(3)Betweenness Centrality (中介中心度,algo.betweenness)
(4)Closeness Centrality (接近/紧密中心度,algo.closeness)
(4)Degree Centrality(度中心性,algo.degree)
(5)Eigenvector Centrality(特征向量中心度,algo.eigenvector)
(6)Harmonic Centrality(谐波中心性,algo.closseness)
社区检测算法(Community detection):
(1)Louvain (鲁汶算法,algo.louvain)
(2)Label Propagation (标签传播,algo.labelPropagagtion)
(3)Weakly Connected Components (弱连通组件WCC,algo.unionFind)
(4)Strongly Connected Components (强连通组件,algo.scc)
(5)Triangle Counting / Clustering Coefficient (三角计数/聚类系数,algo.triangleCount)
(6) Balanced Triads(平衡三角算法,algo.balancedTriads)
路径搜索算法(Path Finding & Search):
(1)Minimum Weight Spanning Tree (最小权重生成树,algo.mst)
(2)Shortest Path (最短路径,algo.shortestPath)
(3)Single Source Shortest Path (单源最短路径,algo.shortestPath.deltastepping)
(4)All Pairs Shortest Path (全顶点对最短路径,algo.allShortestPath)
(5)A* ( A*遍历算法, algo.shortestPath.astar)
(6)Yen’s K-shortest paths(Yen k最短路径,algo.kShortestPaths)
(7)Random Walk (随机漫步,algo.randomWalk)
(8)Path Expanding(路径扩展过程,apoc.path.expand)
相似性算法(Similarity):
(1)Jaccard Similarity (Jaccard相似度,algo.similarity.jaccard)
(2)Cosine Similarity (余弦相似度,algo.similarity.consine)
(3)Pearson Similarity (Pearson相似度,algo.similarity.pearson)
(4)Euclidean Distance (欧氏距离,algo.similarity.euclidean)
(5)Overlap Similarity (重叠相似度,algo.similarity.overlap)
(6)Approximate Nearest Neighbors(近似近邻法ANN,algo.labs.ml.ann)
链接预测(Link Prediction):
(1)Adamic Adar(algo.linkprediction.adamicAdar)
(2)Common Neighbors(相同邻居,algo.linkprediction.commonNeighbors)
(3)Preferential Attachment(择优连接,algo.linkprediction.preferentialAttachment)
(4)Resource Allocation(资源分配,algo.linkprediction.resourceAllocation)
(5)Same Community(相同社区,algo.linkprediction.sameCommunity)
(6)Total Neighbors(总邻居,algo.linkprediction.totalNeighbors)
预处理算法(Preprocessing):
(1)One Hot Encoding(独热编码,algo.ml.oneHotEncoding)
PathFinding & Search 一般用来发现Nodes之间的最短路径,常用算法有如下几种 Google Search
Results Dijkstra - 边不能为负值 Folyd - 边可以为负值,有向图、无向图 Bellman-Ford SPFA
Centrality 一般用来计算这个图中节点的中心性,用来发现比较重要的那些Nodes。这些中心性可以有很多种,比如 Degree
Centrality - 度中心性 Weighted Degree Centrality - 加权度中心性 Betweenness
Centrality - 介数中心性 Closeness Centrality - 紧度中心性 Community Detection
基于社区发现算法和图分析Neo4j解读《权力的游戏》 用于发现这个图中局部联系比较紧密的Nodes,类似我们学过的聚类算法。
Strongly Connected Components Weakly Connected Components (Union Find)
Label Propagation Lovain Modularity Triangle Count and Average
Clustering Coefficient
本章直接用NEO4J的算法包(只做代码的搬运工,从不自己原创代码)。
1、安装算法包
- 下载算法包:
从https://github.com/neo4j-contrib/neo4j-graph-algorithms/releases下载相应版本jar包(例:graph-algorithms-algo-3.5.4.0),放到
C:\Users\Administrator.Neo4jDesktop\neo4jDatabases\database-数据库ID\installation-3.5.6\plugins 目录下面- 配置文件:
在 C:\Users\Administrator.Neo4jDesktop\neo4jDatabases\database-数据库ID\installation-3.5.6/conf/neo4j.conf 配置文件中添加
dbms.security.procedures.unrestricted=algo.*- 重启neo4j
- 查看是否安装成功
执行命令:CALL algo.list()
中心度
// Closeness Centrality (接近/紧密中心度,algo.closeness)
CALL algo.closeness.stream("Node", "LINK")
YIELD nodeId, centrality
MATCH (n:Node) WHERE id(n) = nodeId
RETURN n.id AS node, centrality
ORDER BY centrality DESC
LIMIT 20;
//Betweenness Centrality (中介中心度,algo.betweenness)
CALL algo.betweenness.stream("User", "MANAGES", {direction:"out"})
YIELD nodeId, centrality
MATCH (user:User) WHERE id(user) = nodeId
RETURN user.id AS user,centrality
ORDER BY centrality DESC
//PageRank (页面排名,algo.pageRank)
CALL algo.pageRank.stream("Page", "LINKS",
{iterations:20})
YIELD nodeId, score
MATCH (node) WHERE id(node) = nodeId
RETURN node.name AS page,score
ORDER BY score DESC
社区划分
// Louvain (鲁汶算法,algo.louvain)
// 源码
CALL algo.beta.louvain(label: STRING, relationship: STRING, {
write: BOOLEAN,
writeProperty: STRING
// additional configuration
})
YIELD nodes, communities, modularity, loadMillis, computeMillis, writeMillis
//案例
CALL algo.louvain.stream("User", "FRIEND", {})
YIELD nodeId, community
MATCH (user:User) WHERE id(user) = nodeId
RETURN user.id AS user, community
ORDER BY community;
//Label Propagation (标签传播,algo.labelPropagagtion)
CALL algo.labelPropagation.stream("User", "FOLLOWS",
{direction: "OUTGOING", iterations: 10})
路径
//Shortest Path (最短路径,algo.shortestPath)
MATCH (start:Loc{name:"A"}), (end:Loc{name:"F"})
CALL algo.shortestPath.stream(start, end, "cost")
YIELD nodeId, cost
MATCH (other:Loc)
WHERE id(other) = nodeId
RETURN other.name AS name, cost
//Single Source Shortest Path (单源最短路径,algo.shortestPath.deltastepping)
MATCH (n:Loc {name:"A"})
CALL algo.shortestPath.deltaStepping.stream(n, "cost", 3.0
YIELD nodeId, distance
MATCH (destination) WHERE id(destination) = nodeId
RETURN destination.name AS destination, distance