CS224W: Machine Learning with Graphs - 04 Graph as Matrix: PageRank,Random Walks and Embeddings

Graph as Matrix: PageRank,Random Walks and Embeddings

0. Graph as Matrix

Investigate graph analysis and learning from a matrix perspective to

  • Determine node importance via random walk (PageRank)
  • Obtain node embeddings via matrix factorization (MF)
  • View other node ebeddings (e.g., Node2Vec) as MF

1. PageRank: Google Algorithm

0). Example: the Web as a Graph

  • Web as a graph: nodes → \to web pages, edges → \to hyperlinks
  • In early days of the Web, links were navigational; today many links are transational (post, comment, like, buy, …)
  • Web as a directed graph

1). Link Analysis Algorithms

  • PageRank
  • Personalized PageRank (PPR)
  • Random Walk with Restarts

2). PageRank: the “Flow” Model

Idea: links as votes (page is more important if it has more links)
Links from important pages count more
Resursive question
A vote from an imporatnt page is worth more

  • Each link’s vote is proportional to the importance of its source page
  • If page i i i with importance r i r_i ri has d i d_i di out-links, each link gets r i / d i r_i/d_i ri/di votes
  • Page j j j's own importance r j r_j rj is the sum of the votes on its in-links
    r j = ∑ i → j r i d i r_j=\sum_{i\to j} \frac{r_i}{d_i} rj=ijdiri

3). PageRank: Matrix Formulation

  • Stochastic adjacency matrix M M M
    If j → i j \to i ji, then M i j = 1 d j M_{ij}=\frac{1}{d_j} Mij=dj1
    M is a column stochastic matrix (columns sum to 1
  • Rank vector r r r: an entry per page
    r i r_i ri is the importance score of page i i i ( ∑ i r i = 1 \sum_i r_i =1 iri=1)
  • The flow equation can be written as
    r = M ⋅ r r = M \cdot r r=Mr

4). Connection to Random Walk

Imageine a random web surfer
a) At any time t t t, surfer is on some page i i i
b) At any time t + 1 t+1 t+1, the surfer follows an out-link from i i i uniformly at random
c) Ends up on some page j j j linked from i i i
d) Process repeats indefinitely
Let p ( t ) p(t) p(t) denote the vector whose i t h i^{th} ith coordinate is the probability that the surfer is at page i i i at time t t t. So p ( t ) p(t) p(t) is a probability distribution over pages

5). The Stationary Distribution

Follow a link uniformly at random
p ( t + 1 ) = M ⋅ p ( t ) p(t+1) = M\cdot p(t) p(t+1)=Mp(t)
Suppose the random walk reaches a state
p ( t + 1 ) = M ⋅ p ( t ) = p ( t ) p(t+1) = M\cdot p(t)=p(t) p(t+1)=Mp(t)=p(t)
then p ( t ) p(t) p(t) is stationary distribution of a random walk
Since r = M ⋅ r r=M\cdot r r=Mr, r r r is a stationary distribution for the random walk

6). Eigenvector Formulation

The flow equation 1 ⋅ r = M ⋅ r 1 \cdot r=M\cdot r 1r=Mr. So the rank vector r r r is an eigenvector of the stochastic adjacency matrix M M M with eigenvalue 1
PageRank = Limiting distribution = principal eigenvector of M M M, r r r is the principal eigenvector of M M M eigenvalue 1

2. PageRank: How to Solve

Given a graph with n n n nodes, we use an iterative procedure:

  • Assign each node an initial page rank
  • Repeat until convergence ( ∑ i ∣ r i t + 1 − r i t ∣ < ϵ \sum_i|r_i^{t+1} - r_i^t|<\epsilon irit+1rit<ϵ), where r j t + 1 = ∑ i → j r i d i r_j^{t+1}=\sum_{i\to j} \frac{r_i}{d_i} rjt+1=ijdiri

1). Power Iteration Method

Given a web graph with N N N nodes, where the nodes are pages and edges are hyperlinks

  • Initialize: r 0 = [ 1 / N , … , 1 / N ] T r^0=[1/N, \dots,1/N]^T r0=[1/N,,1/N]T
  • Iterate: r t + 1 = M ⋅ r t r^{t+1}=M\cdot r^t rt+1=Mrt
  • Stop when ∣ r t + 1 − r t ∣ < ϵ |r^{t+1} - r^t|<\epsilon rt+1rt<ϵ

About 50 iterations is sufficient to estimate the limiting solution

2). Problems

a). Dead ends

Some pages have no out-links → \to cause importance to ‘leak out’
Solutions: teleports follow random teleport links with total probability 1.0 from dead-ends

  • Adjust matrix accordingly
b). Spider traps

All out-links of some pages are within the group → \to eventually absord all importance
Solutions: at each time step, the random surfer has two options

  • With probability β \beta β, follow a link at random
  • With probability 1 − β 1-\beta 1β, jump to a random page
  • Common values for β \beta β are in [0.8, 0.9]

Surfer will teleport out of spider trap within a few time steps

c). Why teleports solve the problems

Spider traps are not a problem, but with traps PageRank scores are not what we want
Solution: never get stuck in a spider trap by teleporting out of it in a finite number of steps
Dead-ends are a problem: the matrix is not column stochastic so our initial assumptions are not met
Solution: make matrix column stochastic by always teleporting when there is nowhere else to go

3). Solution: Random Teleports

PageRank equation
r j t + 1 = ∑ i → j β r i d i + ( 1 − β ) 1 N r_j^{t+1}=\sum_{i\to j}\beta \frac{r_i}{d_i}+(1-\beta)\frac{1}{N} rjt+1=ijβdiri+(1β)N1
The Google Matrix G G G:
G = β M + ( 1 − β ) [ 1 N ] N × N G=\beta M+(1-\beta)[\frac{1}{N}]_{N\times N} G=βM+(1β)[N1]N×N
We have a recursive problem: r = G ⋅ r r=G\cdot r r=Gr and the Power method still works

3. Random Walk with Restarts and Personalized PageRank

1). Proximity on Graphs

  • PageRank: teleports with uniform probability to any node in the network
  • Personalized PageRank: ranks proximity of nodes to the teleport nodes S S S
  • Proximity on Graphs: random walks with restarts - teleport back to the starting node

2). Random Walks

Idea

  • Every node has some importance
  • Importance gets evenly split among all edges and pushed to the neighbors

Given a set of QUERY_NODES, we simulate a random walk

  • Make a step to a random neighbor and record the visit (visit count)
  • With probability α \alpha α, restart the walk at one of the QUERY_NODES
  • The nodes with the highest visit count have highest proximity to the QUERY_NODES

Benefits: the “similarity” considers

  • Multiple connections
  • Multiple paths
  • Direct and indirect connections
  • Degree of the node

3). PageRank Variants

PageRank: teleports to any node and nodes can have the same probability of the surfer landing
S = [ 0.2 , 0.2 , 0.2 , 0.2 , 0.2 ] S=[0.2, 0.2, 0.2, 0.2, 0.2] S=[0.2,0.2,0.2,0.2,0.2]
Topic-specific PageRank aka Personalized PageRank: teleports to a specific set of nodes and nodes can have different probabilities of the surfer landing
S = [ 0.3 , 0 , 0.5 , 0.2 , 0 ] S=[0.3, 0, 0.5, 0.2, 0] S=[0.3,0,0.5,0.2,0]
Random walks with restarts: Topic-specific PageRank where teleport is always to the same node
S = [ 0 , 0 , 0 , 1 , 0 ] S=[0, 0, 0, 1, 0] S=[0,0,0,1,0]

4. Matrix Factorization and Node Embeddings

0). Relationship between Node Embeddings and Matrix Factorization

Node embeddings
Objective: maximize z v T z u z_v^Tz_u zvTzu for node pairs ( u , v ) (u, v) (u,v) that are similar
Matrix factorization
Simplest node similarity: nodes u , v u,v u,v are similar if they are connected by an edge ( z v T z u = A u v z_v^Tz_u=A_{uv} zvTzu=Auv and therefore Z T Z = A Z^TZ=A ZTZ=A)

1). Matrix Factorization

  • The embedding dimension (number of rows in Z Z Z) is much smaller than number of ndoes n n n
  • Exact factorization A = Z T Z A=Z^TZ A=ZTZ is generally not possible
  • However, we can learn Z Z Z approximately
  • Objective: m i n Z ∣ ∣ A − Z T Z ∣ ∣ 2 \underset{Z}{min}||A-Z^TZ||_2 ZminAZTZ2
  • Conclusion: inner product decoder with node similarity defined by edge connectivity is equivalent to matrix factorization of A A A

2). Random Walk-based Similarity

DeepWalk and node2vec have a more complex node similarity definition based on random walks

  • DeepWalk is equivalent to matrix factorization of the following matrix expression
    log ⁡ ( v o l ( G ) ( 1 T ∑ r = 1 T ( D − 1 A ) ) D − 1 ) − log ⁡ b \log(vol(G)(\frac{1}{T}\sum_{r=1}^T(D^{-1}A))D^{-1})-\log b log(vol(G)(T1r=1T(D1A))D1)logb
  • Node2vec can also be formulated as a more complex matrix factorization

3). Limitations

  • Cannot obtain embeddings for nodes not in the training set
    If some new nodes are added at test time (e.g., new user in a social network), we need to recompute all node embeddings
  • Cannot capture structural similarity
    If two nodes are far from each other, they will have very different embeddings because it is unlikely that a random walk will reach one node from the other one.
  • Cannot utilize node, edge, and graph features

Solutions: Deep Representation Learning and Graph Neural Networks

你可能感兴趣的:(人工智能,机器学习,图论)