Graph as Matrix: PageRank,Random Walks and Embeddings
0. Graph as Matrix
Investigate graph analysis and learning from a matrix perspective to
- Determine node importance via random walk (PageRank)
- Obtain node embeddings via matrix factorization (MF)
- View other node ebeddings (e.g., Node2Vec) as MF
1. PageRank: Google Algorithm
0). Example: the Web as a Graph
- Web as a graph: nodes → \to → web pages, edges → \to → hyperlinks
- In early days of the Web, links were navigational; today many links are transational (post, comment, like, buy, …)
- Web as a directed graph
1). Link Analysis Algorithms
- PageRank
- Personalized PageRank (PPR)
- Random Walk with Restarts
2). PageRank: the “Flow” Model
Idea: links as votes (page is more important if it has more links)
Links from important pages count more
Resursive question
A vote from an imporatnt page is worth more
- Each link’s vote is proportional to the importance of its source page
- If page i i i with importance r i r_i ri has d i d_i di out-links, each link gets r i / d i r_i/d_i ri/di votes
- Page j j j's own importance r j r_j rj is the sum of the votes on its in-links
r j = ∑ i → j r i d i r_j=\sum_{i\to j} \frac{r_i}{d_i} rj=i→j∑diri
3). PageRank: Matrix Formulation
- Stochastic adjacency matrix M M M
If j → i j \to i j→i, then M i j = 1 d j M_{ij}=\frac{1}{d_j} Mij=dj1
M is a column stochastic matrix (columns sum to 1
- Rank vector r r r: an entry per page
r i r_i ri is the importance score of page i i i ( ∑ i r i = 1 \sum_i r_i =1 ∑iri=1)
- The flow equation can be written as
r = M ⋅ r r = M \cdot r r=M⋅r
4). Connection to Random Walk
Imageine a random web surfer
a) At any time t t t, surfer is on some page i i i
b) At any time t + 1 t+1 t+1, the surfer follows an out-link from i i i uniformly at random
c) Ends up on some page j j j linked from i i i
d) Process repeats indefinitely
Let p ( t ) p(t) p(t) denote the vector whose i t h i^{th} ith coordinate is the probability that the surfer is at page i i i at time t t t. So p ( t ) p(t) p(t) is a probability distribution over pages
5). The Stationary Distribution
Follow a link uniformly at random
p ( t + 1 ) = M ⋅ p ( t ) p(t+1) = M\cdot p(t) p(t+1)=M⋅p(t)
Suppose the random walk reaches a state
p ( t + 1 ) = M ⋅ p ( t ) = p ( t ) p(t+1) = M\cdot p(t)=p(t) p(t+1)=M⋅p(t)=p(t)
then p ( t ) p(t) p(t) is stationary distribution of a random walk
Since r = M ⋅ r r=M\cdot r r=M⋅r, r r r is a stationary distribution for the random walk
6). Eigenvector Formulation
The flow equation 1 ⋅ r = M ⋅ r 1 \cdot r=M\cdot r 1⋅r=M⋅r. So the rank vector r r r is an eigenvector of the stochastic adjacency matrix M M M with eigenvalue 1
PageRank = Limiting distribution = principal eigenvector of M M M, r r r is the principal eigenvector of M M M eigenvalue 1
2. PageRank: How to Solve
Given a graph with n n n nodes, we use an iterative procedure:
- Assign each node an initial page rank
- Repeat until convergence ( ∑ i ∣ r i t + 1 − r i t ∣ < ϵ \sum_i|r_i^{t+1} - r_i^t|<\epsilon ∑i∣rit+1−rit∣<ϵ), where r j t + 1 = ∑ i → j r i d i r_j^{t+1}=\sum_{i\to j} \frac{r_i}{d_i} rjt+1=∑i→jdiri
1). Power Iteration Method
Given a web graph with N N N nodes, where the nodes are pages and edges are hyperlinks
- Initialize: r 0 = [ 1 / N , … , 1 / N ] T r^0=[1/N, \dots,1/N]^T r0=[1/N,…,1/N]T
- Iterate: r t + 1 = M ⋅ r t r^{t+1}=M\cdot r^t rt+1=M⋅rt
- Stop when ∣ r t + 1 − r t ∣ < ϵ |r^{t+1} - r^t|<\epsilon ∣rt+1−rt∣<ϵ
About 50 iterations is sufficient to estimate the limiting solution
2). Problems
a). Dead ends
Some pages have no out-links → \to → cause importance to ‘leak out’
Solutions: teleports follow random teleport links with total probability 1.0 from dead-ends
- Adjust matrix accordingly
b). Spider traps
All out-links of some pages are within the group → \to → eventually absord all importance
Solutions: at each time step, the random surfer has two options
- With probability β \beta β, follow a link at random
- With probability 1 − β 1-\beta 1−β, jump to a random page
- Common values for β \beta β are in [0.8, 0.9]
Surfer will teleport out of spider trap within a few time steps
c). Why teleports solve the problems
Spider traps are not a problem, but with traps PageRank scores are not what we want
Solution: never get stuck in a spider trap by teleporting out of it in a finite number of steps
Dead-ends are a problem: the matrix is not column stochastic so our initial assumptions are not met
Solution: make matrix column stochastic by always teleporting when there is nowhere else to go
3). Solution: Random Teleports
PageRank equation
r j t + 1 = ∑ i → j β r i d i + ( 1 − β ) 1 N r_j^{t+1}=\sum_{i\to j}\beta \frac{r_i}{d_i}+(1-\beta)\frac{1}{N} rjt+1=i→j∑βdiri+(1−β)N1
The Google Matrix G G G:
G = β M + ( 1 − β ) [ 1 N ] N × N G=\beta M+(1-\beta)[\frac{1}{N}]_{N\times N} G=βM+(1−β)[N1]N×N
We have a recursive problem: r = G ⋅ r r=G\cdot r r=G⋅r and the Power method still works
3. Random Walk with Restarts and Personalized PageRank
1). Proximity on Graphs
- PageRank: teleports with uniform probability to any node in the network
- Personalized PageRank: ranks proximity of nodes to the teleport nodes S S S
- Proximity on Graphs: random walks with restarts - teleport back to the starting node
2). Random Walks
Idea
- Every node has some importance
- Importance gets evenly split among all edges and pushed to the neighbors
Given a set of QUERY_NODES, we simulate a random walk
- Make a step to a random neighbor and record the visit (visit count)
- With probability α \alpha α, restart the walk at one of the QUERY_NODES
- The nodes with the highest visit count have highest proximity to the QUERY_NODES
Benefits: the “similarity” considers
- Multiple connections
- Multiple paths
- Direct and indirect connections
- Degree of the node
3). PageRank Variants
PageRank: teleports to any node and nodes can have the same probability of the surfer landing
S = [ 0.2 , 0.2 , 0.2 , 0.2 , 0.2 ] S=[0.2, 0.2, 0.2, 0.2, 0.2] S=[0.2,0.2,0.2,0.2,0.2]
Topic-specific PageRank aka Personalized PageRank: teleports to a specific set of nodes and nodes can have different probabilities of the surfer landing
S = [ 0.3 , 0 , 0.5 , 0.2 , 0 ] S=[0.3, 0, 0.5, 0.2, 0] S=[0.3,0,0.5,0.2,0]
Random walks with restarts: Topic-specific PageRank where teleport is always to the same node
S = [ 0 , 0 , 0 , 1 , 0 ] S=[0, 0, 0, 1, 0] S=[0,0,0,1,0]
4. Matrix Factorization and Node Embeddings
0). Relationship between Node Embeddings and Matrix Factorization
Node embeddings
Objective: maximize z v T z u z_v^Tz_u zvTzu for node pairs ( u , v ) (u, v) (u,v) that are similar
Matrix factorization
Simplest node similarity: nodes u , v u,v u,v are similar if they are connected by an edge ( z v T z u = A u v z_v^Tz_u=A_{uv} zvTzu=Auv and therefore Z T Z = A Z^TZ=A ZTZ=A)
1). Matrix Factorization
- The embedding dimension (number of rows in Z Z Z) is much smaller than number of ndoes n n n
- Exact factorization A = Z T Z A=Z^TZ A=ZTZ is generally not possible
- However, we can learn Z Z Z approximately
- Objective: m i n Z ∣ ∣ A − Z T Z ∣ ∣ 2 \underset{Z}{min}||A-Z^TZ||_2 Zmin∣∣A−ZTZ∣∣2
- Conclusion: inner product decoder with node similarity defined by edge connectivity is equivalent to matrix factorization of A A A
2). Random Walk-based Similarity
DeepWalk and node2vec have a more complex node similarity definition based on random walks
- DeepWalk is equivalent to matrix factorization of the following matrix expression
log ( v o l ( G ) ( 1 T ∑ r = 1 T ( D − 1 A ) ) D − 1 ) − log b \log(vol(G)(\frac{1}{T}\sum_{r=1}^T(D^{-1}A))D^{-1})-\log b log(vol(G)(T1r=1∑T(D−1A))D−1)−logb
- Node2vec can also be formulated as a more complex matrix factorization
3). Limitations
- Cannot obtain embeddings for nodes not in the training set
If some new nodes are added at test time (e.g., new user in a social network), we need to recompute all node embeddings
- Cannot capture structural similarity
If two nodes are far from each other, they will have very different embeddings because it is unlikely that a random walk will reach one node from the other one.
- Cannot utilize node, edge, and graph features
Solutions: Deep Representation Learning and Graph Neural Networks