CS224W摘要08.Applications of Graph Neural Networks


  • Graph Feature augmentation
  • Graph Structure augmentation
    • Augment sparse graphs
      • Add virtual edges
      • Add virtual nodes
    • Augment dense graphs
  • Prediction with GNNs
    • GNN Prediction Heads
      • Node-level prediction
      • Edge-level prediction
      • Graph-level prediction
  • Training Graph Neural Networks
    • Supervised vs Unsupervised
    • Loss
      • Classification loss
      • Regression loss
    • Evaluation metrics
  • Setting-up GNN Prediction Tasks
    • 固定/随机划分
    • Transductive/Inductive setting
    • Transductive/Inductive 划分实例:node classification
    • Transductive/Inductive 划分实例:link prediction
      • Inductive link prediction split
      • Transductive link prediction split

CS224W: Machine Learning with Graphs
Raw input graph = computational graph

  1. Features:
    § The input graph lacks features
  2. Graph structure:
    § The graph is too sparse →inefficient message passing
    § The graph is too dense →message passing is too costly(某些微博节点关注量上百万,做aggregation计算量太大)
    § The graph is too large →cannot fit the computational graph into a GPU

Graph Feature augmentation
Graph Structure augmentation
§ The graph is too sparse →Add virtual nodes / edges
§ The graph is too dense →Sample neighbors when doing message passing
§ The graph is too large →Sample subgraphs to compute embeddings

Graph Feature augmentation

a)Assign constant values to nodes
CS224W摘要08.Applications of Graph Neural Networks_第1张图片

b)Assign unique IDs to nodes,一般使用独热编码
CS224W摘要08.Applications of Graph Neural Networks_第2张图片

方案a 方案b
Expressive power Medium. All the nodes are identical, but GNN can still learn from the graph structure High. Each node has a unique ID, so node-specific information can be stored
Inductive learning (Generalize to unseen nodes) High. Simple to generalize to new nodes: we assign constant feature to them, then apply our GNN Low. Cannot generalize to new nodes: new nodes introduce new IDs, GNN doesn’t know how to embed unseen IDs
Computational cost Low. Only 1 dimensional feature High. O(V) dimensional feature, cannot apply to large graphs
Use cases Any graph, inductive settings (generalize to new nodes) Small graph, transductive settings (no new nodes)

2.有些图结构GNN很难学习到,例如:Cycle count feature
CS224W摘要08.Applications of Graph Neural Networks_第3张图片
两个图中的 v 1 v_1 v1节点度都为2,以 v 1 v_1 v1节点做出来的计算图都是一样的二叉树
CS224W摘要08.Applications of Graph Neural Networks_第4张图片
解决方案是把cycle count直接作为特征加到节点信息里面
CS224W摘要08.Applications of Graph Neural Networks_第5张图片
§ Node degree
§ Clustering coefficient
§ PageRank
§ Centrality

Graph Structure augmentation

Augment sparse graphs

Add virtual edges

Connect 2-hop neighbors via virtual edges
该法与计算图的邻接矩阵: A + A 2 A+A^2 A+A2效果一样
例如:Author-to-papers的Bipartite graph中
CS224W摘要08.Applications of Graph Neural Networks_第6张图片
2-hop virtual edges make an author-author collaboration graph.

Add virtual nodes

The virtual node will connect to all the nodes in the graph.
§ Suppose in a sparse graph, two nodes have shortest path distance of 10.
§ After adding the virtual node, all the nodes will have a distance of two.
Greatly improves message passing in sparse graphs.
CS224W摘要08.Applications of Graph Neural Networks_第7张图片

Augment dense graphs

CS224W摘要08.Applications of Graph Neural Networks_第8张图片
CS224W摘要08.Applications of Graph Neural Networks_第9张图片
CS224W摘要08.Applications of Graph Neural Networks_第10张图片

Prediction with GNNs

CS224W摘要08.Applications of Graph Neural Networks_第11张图片

GNN Prediction Heads

Idea: Different task levels require different prediction heads

Node-level prediction

Suppose we want to make -way prediction
§ Classification: classify among categories
§ Regression: regress on targets

y ^ v = H e a d n o d e h v ( L ) = W ( H ) h v ( L ) \hat y_v=Head_{node}h_v^{(L)}=W^{(H)}h_v^{(L)} y^v=Headnodehv(L)=W(H)hv(L)

Edge-level prediction

y ^ u v = H e a d e d g e ( h u ( L ) , h v ( L ) ) \hat y_{uv}=Head_{edge}(h_u^{(L)},h_v^{(L)}) y^uv=Headedge(hu(L),hv(L))
1.Concatenation + Linear
CS224W摘要08.Applications of Graph Neural Networks_第12张图片
y ^ u v = L i n e a r ( C o n c a t ( h u ( L ) , h v ( L ) ) ) \hat y_{uv}=Linear(Concat(h_u^{(L)},h_v^{(L)})) y^uv=Linear(Concat(hu(L),hv(L)))
这里的Linear操作会将2×d维的concat结果映射为 k-dim embeddings(相当于-way prediction)
2.Dot product
y ^ u v = ( h u ( L ) ) T h v ( L ) \hat y_{uv}=(h_u^{(L)})^Th_v^{(L)} y^uv=(hu(L))Thv(L)
由于点积后得到的是常量,因此该方法用于-way prediction,通常是指边是否存在。
如果要把这个方法用在-way prediction,则可以参考多头注意力机制设置k个可训练的参数: W ( 1 ) , W ( 2 ) , ⋯   , W ( k ) W^{(1)},W^{(2)},\cdots,W^{(k)} W(1),W(2),,W(k)
y ^ u v ( 1 ) = ( h u ( L ) ) T W ( 1 ) h v ( L ) ⋯ y ^ u v ( k ) = ( h u ( L ) ) T W ( k ) h v ( L ) \hat y_{uv}^{(1)}=(h_u^{(L)})^TW^{(1)}h_v^{(L)}\\ \cdots\\ \hat y_{uv}^{(k)}=(h_u^{(L)})^TW^{(k)}h_v^{(L)} y^uv(1)=(hu(L))TW(1)hv(L)y^uv(k)=(hu(L))TW(k)hv(L)
y ^ u v = C o n c a t ( y u v ( 1 ) , ⋯   , y u v ( k ) ) \hat y_{uv}=Concat(y_{uv}^{(1)},\cdots,y_{uv}^{(k)}) y^uv=Concat(yuv(1),,yuv(k))

Graph-level prediction

y ^ G = H e a d g r a p h ( { h v ( L ) ∈ R d , ∀ v ∈ G } ) \hat y_G=Head_{graph}(\{h_v^{(L)}\in \R^d,\forall v\in G\}) y^G=Headgraph({hv(L)Rd,vG})
we use 1-dim node embeddings
§ Node embeddings for 1 : { − 1 , − 2 , 0 , 1 , 2 } _1: \{−1,−2, 0, 1, 2\} G1:{1,2,0,1,2}
§ Node embeddings for 2 : { − 10 , − 20 , 0 , 10 , 20 } _2: \{−10,−20, 0, 10, 20\} G2:{10,20,0,10,20}

G 1 : y ^ a = R e L U ( S u m ( { − 1 , − 2 } ) ) = 0 , y ^ b = R e L U ( S u m ( { 0 , 1 , 2 } ) ) = 3 G_1:\hat y_a=ReLU(Sum(\{-1,-2\}))=0,\hat y_b=ReLU(Sum(\{0,1,2\}))=3 G1:y^a=ReLU(Sum({1,2}))=0,y^b=ReLU(Sum({0,1,2}))=3
G 2 : y ^ a = R e L U ( S u m ( { − 1 , − 2 } ) ) = 0 , y ^ b = R e L U ( S u m ( { 0 , 1 , 2 } ) ) = 30 G_2:\hat y_a=ReLU(Sum(\{-1,-2\}))=0,\hat y_b=ReLU(Sum(\{0,1,2\}))=30 G2:y^a=ReLU(Sum({1,2}))=0,y^b=ReLU(Sum({0,1,2}))=30
G 1 : y ^ G 1 = R e L U ( S u m ( { y a , y b } ) ) = 3 G_1:\hat y_{G_1}=ReLU(Sum(\{y_a,y_b\}))=3 G1:y^G1=ReLU(Sum({ya,yb}))=3
G 2 : y ^ G 2 = R e L U ( S u m ( { y a , y b } ) ) = 30 G_2:\hat y_{G_2}=ReLU(Sum(\{y_a,y_b\}))=30 G2:y^G2=ReLU(Sum({ya,yb}))=30

CS224W摘要08.Applications of Graph Neural Networks_第13张图片
Leverage 2 independent GNNs at each level
§ GNN A: Compute node embeddings
§ GNN B: Compute the cluster that a node belongs to

For each Pooling layer
§ Use clustering assignments from GNN B to aggregate node embeddings generated by GNN A
§ Create a single new node for each cluster, maintaining edges between clusters to generated a new pooled network
Jointly train GNN A and GNN B

Training Graph Neural Networks

这节主要是接着讲如何将预测结果和Label进行对比(Loss function)和评估(Evaluation metrics)。

Supervised vs Unsupervised

Supervised learning on graphs: Labels come from external sources
E.g., predict drug likeness of a molecular graph
Unsupervised learning on graphs: Signals come from graphs themselves
E.g., link prediction: predict if two nodes are connected
注意:Sometimes the differences are blurry

Unsupervised Supervised
Node labels _ yv Node statistics: such as clustering coefficient, PageRank, … in a citation network, which subject area does a node belong to
Edgelabels u v _{uv} yuv Link prediction: hide the edge between two nodes, predict if there should be a link in a transaction network, whether an edge is fraudulent
Graphlabels G _G yG Graph statistics: for example, predict if two graphs are isomorphic among molecular graphs, the drug likeness of graphs


We will use prediction y ^ ( i ) \hat y^{(i)} y^(i), label y ( i ) y^{(i)} y(i) to refer predictions at all levels(node/edge/graph)
分类或者回归不同在于loss function & evaluation metrics

Classification loss

labels y ( i ) y^{(i)} y(i) with discrete value
E.g., Node classification: which category does a node belong to
As discussed in lecture 6, cross entropy (CE) is a very common loss function in classification
-way prediction for -th data point:
CS224W摘要08.Applications of Graph Neural Networks_第14张图片
L o s s = ∑ i = 1 N C E ( y ( i ) , y ^ ( i ) ) Loss=\sum_{i=1}^NCE(y^{(i)},\hat y^{(i)}) Loss=i=1NCE(y(i),y^(i))

Regression loss

labels y ( i ) y^{(i)} y(i) with continuous value
E.g., predict the drug likeness of a molecular graph

For regression tasks we often use Mean Squared Error (MSE) a.k.a. L2 loss.
-way regression for data point (i):
CS224W摘要08.Applications of Graph Neural Networks_第15张图片
L o s s = ∑ i = 1 N M S E ( y ( i ) , y ^ ( i ) ) Loss=\sum_{i=1}^NMSE(y^{(i)},\hat y^{(i)}) Loss=i=1NMSE(y(i),y^(i))

Evaluation metrics

Evaluate regression tasks on graphs:
Root mean square error (RMSE)
∑ i = 1 N ( y ( i ) − y ^ ( i ) ) 2 N \sqrt{\sum_{i=1}^N\cfrac{(y^{(i)}-\hat y^{(i)})^2}{N}} i=1NN(y(i)y^(i))2
Mean absolute error (MAE)
∑ i = 1 N ∣ y ( i ) − y ^ ( i ) ∣ N {\cfrac{\sum_{i=1}^N|y^{(i)}-\hat y^{(i)}|}{N}} Ni=1Ny(i)y^(i)
Evaluate classification tasks on graphs:
§ Accuracy
§ Precision / Recall
§ If the range of prediction is [0,1], we will use 0.5 as threshold
Metric Agnostic to classification threshold

Setting-up GNN Prediction Tasks



  1. Fixed split: We will split our dataset once
    § Training set: used for optimizing GNN parameters
    § Validation set: develop model/hyperparameters
    § Test set: held out until we report final performance
    A concern: sometimes we cannot guarantee that the test set will really be held out

  2. Random split: we will randomly split our dataset into training / validation / test
    § We report average performance over different random seeds


Transductive/Inductive setting

The input graph can be observed in all the dataset splits (training, validation and test set).

Transductive Inductive
training we compute embeddings using the entire graph, and train using node 1&2’s labels we compute embeddings using the graph over node 1&2, and train using node 1&2’s labels
validation we compute embeddings using the entire graph, and evaluate on node 3&4’s labels At validation time, we compute embeddings using the graph over node 3&4, and evaluate on node 3&4’s labels
原则 The input graph can be observed in all the dataset splits (training, validation and test set). We break the edges between splits to get multiple graphs
图例 CS224W摘要08.Applications of Graph Neural Networks_第16张图片 CS224W摘要08.Applications of Graph Neural Networks_第17张图片
应用 node / edge prediction tasks node / edge / graph tasks
小结 raining / validation / test sets are on the same graph.
The dataset consists of one graph.
The entire graph can be observed in all dataset splits,we only split the labels.
training / validation / test sets are on different graphs.
The dataset consists of multiple graphs.
Each split can only observe the graph(s) within the split. A successful model should generalize to unseen graphs

Transductive/Inductive 划分实例:node classification

Transductive node classification
§ All the splits can observe the entire graph structure, but can only observe the labels of their respective nodes
CS224W摘要08.Applications of Graph Neural Networks_第18张图片
Inductive node classification
§ Suppose we have a dataset of 3 graphs
§ Each split contains an independent graph

CS224W摘要08.Applications of Graph Neural Networks_第19张图片

注意,inductive setting 才能做graph classification,Because we have to test on unseen graphs.
Suppose we have a dataset of 5 graphs. Each split will contain independent graph(s).
CS224W摘要08.Applications of Graph Neural Networks_第20张图片

Transductive/Inductive 划分实例:link prediction

Link prediction is an unsupervised / self-supervised task. We need to create the labels and dataset
splits on our own。就是要自己去掉一些存在的边,然后让模型去预测:
CS224W摘要08.Applications of Graph Neural Networks_第21张图片
这里把保留的边叫:Message edges,去掉的边叫:Supervision edges

CS224W摘要08.Applications of Graph Neural Networks_第22张图片
上面这些可以看做是第一步,接下来第二步才是划分数据集,这里分两种:Transductive/Inductive 划分

Inductive link prediction split

Suppose we have a dataset of 3 graphs. Each inductive split will contain an independent graph.
In train or val or test set, each graph will have 2 types of edges: message edges + supervision edges
CS224W摘要08.Applications of Graph Neural Networks_第23张图片

Transductive link prediction split

This is the default setting when people talk about link prediction, the entire graph can be observed in all dataset splits.
Suppose we have a dataset of 1 graph
CS224W摘要08.Applications of Graph Neural Networks_第24张图片

But since edges are both part of graph structure and the supervision, we need to hold out validation / test edges.
To train the training set, we further need to hold out supervision edges for the training set.

(1) At training time: Use training message edges to predict training supervision edges
CS224W摘要08.Applications of Graph Neural Networks_第25张图片

(2) At validation time: Use training message edges & training supervision edges to predict validation edges
CS224W摘要08.Applications of Graph Neural Networks_第26张图片

(3) At test time: Use training message edges & training supervision edges & validation edges to predict test edges
CS224W摘要08.Applications of Graph Neural Networks_第27张图片
After training, supervision edges are known to GNN. Therefore, an ideal model should use supervision edges in message passing at validation time. The same applies to the test time.
CS224W摘要08.Applications of Graph Neural Networks_第28张图片

阶段 使用的边 预测的边
训练 Training message edges Training supervision edges
验证 Training message edges + Training supervision edges Validation edges
测试 Training message edges + Training supervision edges + Validation edges Test edges
