pagerank简单应用

文本借助networkx模块1实现pagerank算法。
networkx中有三种方式实现pagerank,分别如下:

  1. networkx.pagerank():a pure-Python implementation of the power-method to compute the largest eigenvalue/eigenvector or the Google matrix. It has two parameters that control the accuracy - tol and max_iter.
  2. networkx.pagerank_scipy(): a SciPy sparse-matrix implementation of the power-method. It has the same two accuracy parameters.
  3. networkx.pagerank_numpy():a NumPy (full) matrix implementation that calls the numpy.linalg.eig() function to compute the largest eigenvalue and eigenvector. That function is an interface to the LAPACK dgeev function which is uses a matrix decomposition (direct) method with no tunable parameters.
    如果tol参数足够小并且max_iter参数足够大,则上述三个方法为表现良好的图形产生相同的答案(在数值舍入内)。哪一个更快取决于图表的大小以及求幂方法在图表上的效果2。

networkx实现pagerank代码如下3:

import matplotlib.pyplot as plt
import networkx as nx
import numpy as np

# 构建空图
G=nx.DiGraph()

# 向图中添加节点
pages = ["1","2","3","4"]
G.add_nodes_from(pages)

# 向图中添加边,可以不添加节点,直接添加边
G.add_edges_from([('1','2'), ('1','4'),('1','3'), ('4','1'),('2','3'),('2','4'),('3','1'),('4','3')])

# 绘图
nx.draw(G, with_labels = True)
plt.show() # display

# 计算pagerank值,一种方式
def findPageRank(linkmatrix,pages):
    eigval, eigvector= np.linalg.eig(linkmatrix) # 计算特征值和特征向量
    dominant_eigval = np.abs(eigval).max()
    PageRank= np.where(eigval == dominant_eigval)  # pagerank值
    print("The most important node is %s"% str(pages[PageRank[0][0]]))
linkmatrix = np.matrix([[0,0,1,0.5],
                      [1.0/3,0,0,0],
                      [1.0/3,0,0.5,0.5],
                      [1.0/3,0,0.5,0]])
findPageRank(linkmatrix,pages)
# 另一种方式,可以改用pagerank或pagerank_numpy
result = nx.pagerank_scipy(G, alpha=1, personalization=None, max_iter=100, tol=1e-06, weight='weight', dangling=None)

你可能感兴趣的:(算法实现)