CS224W: Machine Learning with Graphs - 10 Heterogeneous Graphs and Knowledge Graph Embeddings

Heterogeneous Graphs and Knowledge Graph Embeddings

1. Heterogeneous Graphs

A heterogeneous graph is defined as
G = ( V , E , R , T ) G=(V,E,R,T) G=(V,E,R,T)

  • Nodes with node types v i ∈ V v_i\in V viV
  • Edges with relation types ( v i , r , v j ) ∈ E (v_i,r,v_j)\in E (vi,r,vj)E
  • Node type T ( v i ) T(v_i) T(vi)
  • Relation type r ∈ R r\in R rR

Example

  • Example nodes: SFO, EWR, UA689
  • Example edges: (UA689, origin, LAX)
  • Example node types: flight, airport, cause
  • Example edge types (relation): destination, origin, cancelled by, delayed by

1). Relational GCN

a). Definition

For directed graphs with one relation, we only pass messages along direction of edges
For directed graphs with multiple relation types, we use different NN weights for different relation types.
h v k + 1 = σ ( ∑ r ∈ R ∑ u ∈ N v r 1 c v , r W r l h u l + W 0 l h v l ) h^{k+1}_v=\sigma(\sum_{r\in R}\sum_{u\in N_v^r}\dfrac{1}{c_{v,r}}W^l_rh^l_u+W^l_0h^l_v) hvk+1=σ(rRuNvrcv,r1Wrlhul+W0lhvl)
Normalized by node degree of the relation c v , r = ∣ N v r ∣ c_{v,r}=|N_v^r| cv,r=Nvr

b). Scalability

Each relation has L L L matrices: W r 1 , W r 2 , W r 3 , ⋯   , W r L W^1_r, W^2_r, W^3_r,\cdots, W^L_r Wr1,Wr2,Wr3,,WrL so the size of each W r l W^l_r Wrl is d l + 1 × d l d^{l+1}\times d^l dl+1×dl
Problem: 1) rapid number of parameters growth w.r.t. number of relations and 2) overfitting
Two methods to regularize the weights W r l W^l_r Wrl

  • Use block diagonal matrices
    Key insight: make the weights sparse
    If use B B B low-dimensional matrices, then the number of parameters redces from d l + 1 × d l d^{l+1}\times d^l dl+1×dl to B × d l + 1 B × d l B B\times \dfrac{d^{l+1}}{B}\times\dfrac{d^{l}}{B} B×Bdl+1×Bdl
    Limitation: only nearby neurons/dimensions can interact through W W W

  • Basis/Dictionary learning
    Key insight: share weights across different relations
    Represent the matrix of each relation as a linear combination of basis transformations W r = ∑ b = 1 B a r b ⋅ V b W_r=\sum_{b=1}^Ba_{rb}\cdot V_b Wr=b=1BarbVb, where V b V_b Vb is the basis matrices shared across all relations and a r b a_{rb} arb is the learnable importance weight of matrix V b V_b Vb

c). Example: Entity/node classification and link prediction

To be updated.

2). Knowledge Graphs: KG Completion with Embeddings

Knowledge in graph form:

  • Nodes are entities labeled with their types
  • Edges between two nodes capture relationships betweem entities
  • KG is an example of a heterogeneous graph
a). Example of KGs
  • Bibliographic networks
  • Bio KGs
  • Google KG
  • Amazon Product Graph
  • Facebook Graph APi
  • IBM Watson
  • Microsoft Satori
b). Application of KGs
  • Serving information
  • Question answering and conversation agents
c). KG Datasets

FreeBase, Wikidata, DBpedia, YAGO, NELL, etc.
Common characteristics:

  • Massive: millions of nodes and edges
  • Incomplete: many true edges are missing
d). Connectivity patterns in KG

Relations in a heterogeneous KG have different properties

  1. Symmetric relations: r ( h , t ) = r ( t , h ) ∀ h , t r(h,t)=r(t,h)\quad\forall h,t r(h,t)=r(t,h)h,t
    Example: family, roommate
  2. Antisymmetric relations: r ( h , t ) = ¬ r ( t , h ) ∀ h , t r(h,t)=\lnot r(t,h)\quad\forall h,t r(h,t)=¬r(t,h)h,t
    Example: hypernym
  3. Inverse relations: r 2 ( h , t ) ⇒ r 1 ( t , h ) r_2(h,t)\Rightarrow r_1(t,h) r2(h,t)r1(t,h)
    Example: (advisor, advisee)
  4. Composition (transitive) relations: r 1 ( x , y ) ∧ r 2 ( y , z ) ⇒ r 3 ( x , z ) ∀ x , y , z r_1(x,y) \wedge r_2(y,z) \Rightarrow r_3(x,z) \quad \forall x,y,z r1(x,y)r2(y,z)r3(x,z)x,y,z
    Example: my mother’s husband is my father
  5. 1-to-N relations: r ( h , t 1 ) , r ( h , t 2 ) , r ( h , t 3 ) , … , r ( h , t n ) r(h,t_1),r(h,t_2),r(h,t_3),\dots,r(h,t_n) r(h,t1),r(h,t2),r(h,t3),,r(h,tn) are all true
    Example: r r r is “StudentOf”

3). KG Completion

KG completion task: for a given (head, relation), we predict missing tails.

a). KG representation

Edges in KG are represented as triples ( h , r , t ) (h,r,t) (h,r,t): head h h h has relation r r r with tail t t t
Key idea

  • model entities and relations in the embedding/vector space R d R^d Rd and associate entities and relations with shallow embeddings
  • Given a true triple ( h , r , t ) (h,r,t) (h,r,t), the goal is that the embedding of ( h , r ) (h,r) (h,r) should be close to the embedding of t t t
b). TransE

For a triple ( h , r , t ) (h,r,t) (h,r,t), h , r , t ∈ R d h,r,t\in R^d h,r,tRd, h + r ≈ t h+r \approx t h+rt if the given fact is true else h + r ≠ t h+r \neq t h+r=t
Scoring function: f r ( h , t ) = − ∣ ∣ h + r − t ∣ ∣ f_r(h, t) = - ||h+r-t|| fr(h,t)=h+rt
Limitation: cannot model symmetric relations and 1-to-N relations

b). TransR

Model entities as vectors in the entity space R d R^d Rd and model each relation as vector in relation space r ∈ R k r\in R^k rRk with M r ∈ R k × d M_r\in R^{k\times d} MrRk×d as the projection matrix
h ⊥ = M r h , t ⊥ = M r t h_\bot=M_rh, t_\bot=M_rt h=Mrh,t=Mrt
Use M r M_r Mr to project from entity space R d R^d Rd to relation space R k R^k Rk
Scoring function: f r ( h , t ) = − ∣ ∣ h ⊥ + r − t ⊥ ∣ ∣ f_r(h, t) = - ||h_\bot+r-t_\bot|| fr(h,t)=h+rt
Limitation: cannot model composition relations (each relation has a different space)

c). DistMult

Entities and relations using vectors in R k R^k Rk
Scoring function: f r ( h , t ) = < h , r , t > = ∑ i h i ⋅ r i ⋅ t i , h , r , t ∈ R k f_r(h, t) = =\sum_ih_i \cdot r_i \cdot t_i, h,r,t\in R^k fr(h,t)=<h,r,t>=ihiriti,h,r,tRk
It can be viewed as a cosine similarity between h ⋅ r h\cdot r hr and t t t
Limitation: cannot model antisymmetric relations, composition relations and inverse relations

d). ComplEx

Based on DistMult, ComplEx embeds entities and relations in Complex vector space using vectors in C k C^k Ck
Scoring function: f r ( h , t ) = Re ( ∑ i h i ⋅ r i ⋅ t ˉ i ) f_r(h, t) =\text{Re}(\sum_ih_i \cdot r_i \cdot {\bar t}_i) fr(h,t)=Re(ihiritˉi)
Limitation: cannot model composition relations

e). KG embeddings in practice
  • Different KGs may have drastically different relation patterns
  • There is not a general embedding that works for all KGs
  • Try TransE for a quick run if the target KG does not have much symmetric relations
  • Then use more expressive models, e.g., ComplEx, RotatE

你可能感兴趣的:(机器学习,知识图谱,人工智能)