Few-Shot Learning with Graph Neural Networks

2 Related Work

meta-learners方式: 典型代表是, Mishra et al. (2017) used Temporal Convolutions which are deep recurrent networks based on dilated convolutions(扩张卷积), this method also exploits contextual information from the subset T providing very good results. 

deep learning architectures on graph-structured data领域: Graph neural networks(GNN) are in fact natural generalizations of convolutional networks to non-Euclidean graphs. 推荐阅读: We refer the reader to Bronstein et al. (2017) for an exhaustive literature review on the topic.

3 PROBLEM SET-UP

We consider input-output pairs (Ti, Yi)i drawn iid from a distribution P of partially-labeled image collections: 


其中, s 即有label的样本数,r 即无label的样本数,t即待分类的样本数,K即分类类别数。

We will focus in the case t = 1 where we just classify one sample per task T .

Few-Shot Learning: When r = 0, t = 1 and s = qK, there is a single image in the collection with unknown label. If moreover each label appears exactly q times, this setting is referred as the q-shot, K-way learning. 

4 MODEL

we associate T with a fully-connected graph GT = (V,E) where nodes va ∈ V correspond to the images present in T (both labeled and unlabeled). In this context, the setup does not specify a fixed similarity ea,a′ between images xa and xa′ , suggesting an approach where this similarity measure is learnt in a discriminative fashion with a parametric model similarly as in Gilmer et al. (2017), such as a siamese neural architecture. This framework is closely related to the set representation from Vinyals et al. (2016), but extends the inference mechanism using the graph neural network formalism that we detail next.

4.2 GRAPH NEURAL NETWORKS

推荐阅读:  see Bronstein et al. (2017) for a recent survey on models and applications of deep learning on graphs.

给定某个顶点上的输入信息F RV ×,最简单的边之间的相邻操作 the adjacency operator A : F A(F) where (AF)i :=SUM ji wi,jFj , 其中的w 即边的权重。

A GNN layer Gc(·) 的运算公式:(公式2)

where Θ are trainable parameters and ρ(·) is active function (leaky ReLU). B即上文提到的近邻运算集A 中的元素。

GNN 网络图例:

Few-Shot Learning with Graph Neural Networks_第1张图片

说明:在网络第一层5个样本通过边模型A~构建了图,接着通过图卷积(graph conv)获得了节点的embedding,然后在后面的几层继续用A~更新图、用graph conv更新节点embedding, 这样便构成了一个深度GNN,最后输出样本的预测标签。

在构建边模型时,先采用一个4层的CNN网络获得每个节点特征向量,然后将节点对xi,xj的差的绝对值过4层带Batch Norm和Leaky Relu的全连接层,从而获得边的embedding(1.1 A Matrix)。随后,我们将节点的embedding和边的embedding一起过图卷积网络(1.2 Gc block),从而获得更新后的节点的embedding。(关于图卷积参考: https://www.cnblogs.com/yangperasd/p/7071657.html )

In particular, inspired by message-passing algorithms, Kearnes et al.(2016); Gilmer(2017) generalized the GNN to also learn edge features A ̃(k) from the currentnode hidden representation:  

A ̃(k) = φtheta(x(k), x(k))   (公式3)

where φ is a symmetric function parametrized with e.g. a neural network. In this work, we considera Multilayer Perceptron stacked after the absolute difference between two vector nodes: (公式4) 


Then φ is a metric, 并且满足对称性。

The trainable adjacency is then normalized to a stochastic kernel by using a softmax along each row.The resulting update rules for node features are obtained by adding the edge feature kernel A ̃(k) into the generator family A = {A ̃(k), 1} and applying (公式2). Adjacency learning is particularly importantin applications where the input set is believed to have some geometric structure, but the metric is notknown a priori, such as is our case.

In general graphs, the network depth is chosen to be of the order of the graph diameter, so that allnodes obtain information from the entire graph. In our context, however, since the graph is denselyconnected, the depth is interpreted simply as giving the model more expressive power.

Construction of Initial Node Features: 

For images xi ∈ T with known label li, the one-hot encoding of the label is concatenatedwith the embedding features of the image at the input of the GNN:  

x(0) = (φ(xi), h(li)) ,   (公式5)

where φ is a Convolutional neural network (个人理解:提取出高阶特征向量). 

For images x ̃j , x ̄jwith unknownlabel li, we modify the previous construction to account for full uncertainty about the label variableby replacing h(l) with the uniform distribution over the K-simplex: Vj = (φ(x ̃j),K11K), andanalogously for x ̄.





















你可能感兴趣的:(论文)