How powerful are GNNs?
We specially consider local neighborhood structures around each node in a graph
Key question: can GNN node embeddings distinguish different nodes’ local neighborhood structures?
Next: we need to understand how a GNN captures local neighborhood structures
Key observation: subtrees of the same depth can be recursively characterized from the leaf nodes to the root nodes
Observation: neighbor aggregation can be abstracted as a function over a multi-set (a set with repeating elements)
Theorem: any injective multi-set function can be expressed as
Φ ( ∑ x ∈ S f ( x ) ) \Phi(\sum_{x\in S}f(x)) Φ(x∈S∑f(x))
where Φ \Phi Φ and f f f are some non-linear function
Proof Intuition: f f f produces one-hot encodings of colors. Summation of the one-hot encodings retains all the information about the input multi-set
1-hidden-layer MLP with sufficiently large hidden dimensionality and appropriate non-linearity σ ( ⋅ ) \sigma(\cdot) σ(⋅) can approximate any continuous function to an arbitrary accuracy.
We have arrived at an NN that can model any injective multi-set function
MLP Φ ( ∑ x ∈ S MLP f ( x ) ) \text{MLP}_{\Phi}(\sum_{x\in S}\text{MLP}_f(x)) MLPΦ(x∈S∑MLPf(x))
In practice, MLP hidden dimensionality of 100 to 500 is sufficient.
Apply an MLP, element-wise sum, followed by another MLP
MLP Φ ( ∑ x ∈ S MLP f ( x ) ) \text{MLP}_{\Phi}(\sum_{x\in S}\text{MLP}_f(x)) MLPΦ(x∈S∑MLPf(x))
Theorem: GIN’s neighbor aggregation function is injective
GIN is the most expressive GNN in the class of message-passing GNNs
Recall: color refinement algorithm in WL kernel
Given a graph G G G with a set of nodes V V V, assign an initial color c 0 ( v ) c^0(v) c0(v) to each ndoe v v v. Then iteratively refine node colors by
c k + 1 ( v ) = HASH ( { c k ( v ) , { c k ( u ) } u ∈ N ( v ) } ) c^{k+1}(v)=\text{HASH}(\{c^k(v), \{c^k(u)\}_{u\in N(v)}\}) ck+1(v)=HASH({ck(v),{ck(u)}u∈N(v)})
where HASH maps different inputs to different colors. After K K K steps of color refinement, c k ( v ) c^k(v) ck(v) summarizes the structure of K K K-hop neighborhood
Process continues until a stable coloring is reached
Two graphs are considered isomorphic if they have the same set of colors
GIN uses an NN to model the injective HASH function
c k + 1 ( v ) = HASH ( { c k ( v ) , { c k ( u ) } u ∈ N ( v ) } ) c^{k+1}(v)=\text{HASH}(\{c^k(v), \{c^k(u)\}_{u\in N(v)}\}) ck+1(v)=HASH({ck(v),{ck(u)}u∈N(v)})
Specifically, we will model the injective function over the tuple ( c k ( v ) , { c k ( u ) } u ∈ N ( v ) ) (c^k(v), \{c^k(u)\}_{u\in N(v)}) (ck(v),{ck(u)}u∈N(v)) where c k ( v ) c^k(v) ck(v) is root node features and { c k ( u ) } u ∈ N ( v ) \{c^k(u)\}_{u\in N(v)} {ck(u)}u∈N(v) is neighboring node colors
Theorem: any injective function over the tuple ( c k ( v ) , { c k ( u ) } u ∈ N ( v ) ) (c^k(v), \{c^k(u)\}_{u\in N(v)}) (ck(v),{ck(u)}u∈N(v)) can be modeled as
MLP Φ ( ( 1 + ϵ ) ⋅ MLP f ( c k ( v ) ) + ∑ u ∈ N ( v ) MLP f ( c k ( u ) ) ) \text{MLP}_{\Phi}((1+\epsilon)\cdot\text{MLP}_f(c^k(v))+\sum_{u\in N(v)}\text{MLP}_f(c^k(u))) MLPΦ((1+ϵ)⋅MLPf(ck(v))+u∈N(v)∑MLPf(ck(u)))
where ϵ \epsilon ϵ is a learnable scalar.
If input feature c 0 ( v ) c^0(v) c0(v) is represented as one-hot, direct summation is injective.
We only need Φ \Phi Φ to ensure the injectivity
GINConv ( c k ( v ) , { c k ( u ) } u ∈ N ( v ) ) = MLP Φ ( ( 1 + ϵ ) ⋅ c k ( v ) + ∑ u ∈ N ( v ) c k ( u ) ) \text{GINConv}(c^k(v), \{c^k(u)\}_{u\in N(v)})=\text{MLP}_{\Phi}((1+\epsilon)\cdot c^k(v)+\sum_{u\in N(v)}c^k(u)) GINConv(ck(v),{ck(u)}u∈N(v))=MLPΦ((1+ϵ)⋅ck(v)+u∈N(v)∑ck(u))
GIN can be understood as differentiable neural version of the WL graph kernel. They have exactly the same expressiveness. They are both powerful enough to distinguish most of the real-world graphs.
Update target | Update function | |
---|---|---|
WL graph kernel | Node colors (one-hot) | HASH |
GIN | Node embeddings (low-dim vectors) | GINConv |
Advantages of GIN over the WL graph kernel are: