A heterogeneous graph is defined as
G = ( V , E , R , T ) G=(V,E,R,T) G=(V,E,R,T)
Example
For directed graphs with one relation, we only pass messages along direction of edges
For directed graphs with multiple relation types, we use different NN weights for different relation types.
h v k + 1 = σ ( ∑ r ∈ R ∑ u ∈ N v r 1 c v , r W r l h u l + W 0 l h v l ) h^{k+1}_v=\sigma(\sum_{r\in R}\sum_{u\in N_v^r}\dfrac{1}{c_{v,r}}W^l_rh^l_u+W^l_0h^l_v) hvk+1=σ(r∈R∑u∈Nvr∑cv,r1Wrlhul+W0lhvl)
Normalized by node degree of the relation c v , r = ∣ N v r ∣ c_{v,r}=|N_v^r| cv,r=∣Nvr∣
Each relation has L L L matrices: W r 1 , W r 2 , W r 3 , ⋯ , W r L W^1_r, W^2_r, W^3_r,\cdots, W^L_r Wr1,Wr2,Wr3,⋯,WrL so the size of each W r l W^l_r Wrl is d l + 1 × d l d^{l+1}\times d^l dl+1×dl
Problem: 1) rapid number of parameters growth w.r.t. number of relations and 2) overfitting
Two methods to regularize the weights W r l W^l_r Wrl
Use block diagonal matrices
Key insight: make the weights sparse
If use B B B low-dimensional matrices, then the number of parameters redces from d l + 1 × d l d^{l+1}\times d^l dl+1×dl to B × d l + 1 B × d l B B\times \dfrac{d^{l+1}}{B}\times\dfrac{d^{l}}{B} B×Bdl+1×Bdl
Limitation: only nearby neurons/dimensions can interact through W W W
Basis/Dictionary learning
Key insight: share weights across different relations
Represent the matrix of each relation as a linear combination of basis transformations W r = ∑ b = 1 B a r b ⋅ V b W_r=\sum_{b=1}^Ba_{rb}\cdot V_b Wr=∑b=1Barb⋅Vb, where V b V_b Vb is the basis matrices shared across all relations and a r b a_{rb} arb is the learnable importance weight of matrix V b V_b Vb
To be updated.
Knowledge in graph form:
FreeBase, Wikidata, DBpedia, YAGO, NELL, etc.
Common characteristics:
Relations in a heterogeneous KG have different properties
KG completion task: for a given (head, relation), we predict missing tails.
Edges in KG are represented as triples ( h , r , t ) (h,r,t) (h,r,t): head h h h has relation r r r with tail t t t
Key idea
For a triple ( h , r , t ) (h,r,t) (h,r,t), h , r , t ∈ R d h,r,t\in R^d h,r,t∈Rd, h + r ≈ t h+r \approx t h+r≈t if the given fact is true else h + r ≠ t h+r \neq t h+r=t
Scoring function: f r ( h , t ) = − ∣ ∣ h + r − t ∣ ∣ f_r(h, t) = - ||h+r-t|| fr(h,t)=−∣∣h+r−t∣∣
Limitation: cannot model symmetric relations and 1-to-N relations
Model entities as vectors in the entity space R d R^d Rd and model each relation as vector in relation space r ∈ R k r\in R^k r∈Rk with M r ∈ R k × d M_r\in R^{k\times d} Mr∈Rk×d as the projection matrix
h ⊥ = M r h , t ⊥ = M r t h_\bot=M_rh, t_\bot=M_rt h⊥=Mrh,t⊥=Mrt
Use M r M_r Mr to project from entity space R d R^d Rd to relation space R k R^k Rk
Scoring function: f r ( h , t ) = − ∣ ∣ h ⊥ + r − t ⊥ ∣ ∣ f_r(h, t) = - ||h_\bot+r-t_\bot|| fr(h,t)=−∣∣h⊥+r−t⊥∣∣
Limitation: cannot model composition relations (each relation has a different space)
Entities and relations using vectors in R k R^k Rk
Scoring function: f r ( h , t ) = < h , r , t > = ∑ i h i ⋅ r i ⋅ t i , h , r , t ∈ R k f_r(h, t) =
It can be viewed as a cosine similarity between h ⋅ r h\cdot r h⋅r and t t t
Limitation: cannot model antisymmetric relations, composition relations and inverse relations
Based on DistMult, ComplEx embeds entities and relations in Complex vector space using vectors in C k C^k Ck
Scoring function: f r ( h , t ) = Re ( ∑ i h i ⋅ r i ⋅ t ˉ i ) f_r(h, t) =\text{Re}(\sum_ih_i \cdot r_i \cdot {\bar t}_i) fr(h,t)=Re(∑ihi⋅ri⋅tˉi)
Limitation: cannot model composition relations