为了方便,本文的算法展示采用networkx, 接下来的文章主要以networkx为基础,说明图算法的应用
Page Rank is a well-known algorithm developed by Larry Page and Sergey Brin in 1996.
声明:由于原文写的比较好,本文基本ref[1]的文章翻译
跳转
到其他节点(概率1-p)例如下图中的a
,e
组成的F集
对上图进行 Page Rank得到结果每个节点的重要程度如下
import itertools
import pprint
import random
import networkx as nx
import pandas as pd
from matplotlib import pyplot as plt
fraud = pd.DataFrame({
'individual': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'],
'fraudster': [1, 0, 0, 0, 1, 0, 0, 0]
})
# Generate Networkx Graph
G = nx.Graph()
G.add_nodes_from(fraud['individual'])
# randomly determine vertices
for (node1, node2) in itertools.combinations(fraud['individual'], 2):
if random.random() < 0.5:
G.add_edge(node1, node2)
# Draw generated graph
nx.draw_networkx(G, pos=nx.circular_layout(G), with_labels=True)
# Compute Personalized Page Rank
personalization = fraud.set_index('individual')['fraudster'].to_dict()
ppr = nx.pagerank(G, alpha=0.85, personalization=personalization)
pprint.pprint(ppr)
plt.show()
无论如何变换转移矩阵(random),这些“at-risk” individuals 的Page Rank都是比较高的,因此可以被检测出来。
ref:
[1] https://blog.sicara.com/fraud-detection-personalized-page-rank-networkx-15bd52ba2bf6