人类视觉系统有显著的能力去毫不费力的从零样本示例中学习新颖概念。机器学习视觉系统上模仿相同的行为是一个有趣的非常具有挑战性的研究问题,这些研究问题有许多实际的优势在真实世界视觉应用上。在这篇文章中,我们目标是去设计一个零样本视觉学习系统。
(a few-shot visual learning system).
在整个测试阶段,其有能力从一些训练数据中有效的学习新颖类别。于此同时,其将不会遗忘其被训练的初始类别。
To achieve that goal we propose (a) to extend an object recognition system with an attention based few-shot classification weight generator, and (b) to redesign the classifier of a ConvNet model as the cosine similarity function between feature representations
and classification weight vectors
The human visual system.
learn novel concepts from only a few examples.。
devise a few-shot visual learning system 其有能力去有效的学习:
learn novel categories from only a few training data while at the same time it will not forget the inital categories.(base categories)
to extend an object recognition system with an attention based few-shot classification weight generator.
redesign the classifier of a ConvNet model as the consine similarity function between feature representations and classification weight vectors.
unifying the recognition of both novel and base categories.
feature representations that generalize better on unseen categories.
evaluate our method on Mini-ImageNet
deep convolutional nerual networks.(ConvNets)
应用在 image classification tasks、object recognition or scene classification.
ConvNet: recognize a set of visual categories.(object categories or scene types)
manually collect and label thousands of training examples per target category and to apply on them an iterative gradient based optimizations ,routine.(迭代梯度)
这种方法计算资源相当昂贵。
而且。the ConvNet model can recognize remains fixed after training.
for novel categories to collect training data.
在曾倩的训练集中,重新开始上述昂贵的训练程序。
我们将会避免灾难性干扰。
关键点:
有足够的新类别的训练数据至关重要,否则很容易产生过拟合。
developing real_time interactive vision application for portable devices.
(portable devices 便携式设备)
we enhance a typical object recognition system with an extra component.called few-shot classification weight generator :accepts as input a few training examples of a novel category. *
the ConvNet Model 有能力同时处理 the classification weight vectors of both base and novel categories.
这不是可行的,the typical dot-product based classifier.
(the last linear layer of a classification neural network).
验证方式:简单训练一个 by simply training a consine-similarity based ConvNet recognition model.
目标识别学习系统的输入:假设存在一个数据集基类
a dataset of K b a s e K_{base} Kbase base categories:
D t r a i n = ⋃ b = 1 K b a s e { x b , i } i = 1 N b D_{train} = \bigcup^{K_{base}}_{b = 1} \{x_{b,i}\}^{N_b}_{i = 1} Dtrain=b=1⋃Kbase{xb,i}i=1Nb
参数解释
N b N_b Nb: the b-th categroy训练例子的数量。
x b , i x_{b,i} xb,i: its i-th training example. 使用这个当做输入。
我们的工作的目标是有能力去准确识别基类和以动态的方式去优化新颖类别的零样本学习。在没有忘记the base ones.
其包含两个主要成分:
a feature extractor F ( . ∣ θ ) F(.| \theta) F(.∣θ) (学习参数 θ \theta θ)
这能够提取一个 a d-dimensional feature vector:
z = F ( x ∣ θ ) ∈ R d z = F(x|\theta) \in R^{d} z=F(x∣θ)∈Rd 从一个输入图像 x x x。
一个分类器 C ( . ∣ W ∗ ) C(.|W^{*}) C(.∣W∗):在这里 W ∗ = { w k ∗ ∈ R d } k = 1 K ∗ W^{*} = \{w_k^{*} \in R^d\}^{K^{*}}_{k = 1} W∗={wk∗∈Rd}k=1K∗
这里是a set of K ∗ K^{*} K∗ classification weight vectors. :one per object category.
input the feature representation z z z and return a K ∗ K^{*} K∗-dimensional vectors.概率分布得分: p = C ( z ∣ W ∗ ) p = C(z|W^{*}) p=C(z∣W∗) of the K ∗ K^{*} K∗
categories.
a typical convolutional neural network the feature extractor:是网路的一部分,starts from the first layer and ends at the last hidden layer.
the classifier is the last classification layer.
总计:这个ConvNet model will be able to recognize the base object categories.
这包括一个:a meta-learning mechanism
input: a set of K n o v e l K_{novel} Knovel novel categories.
D n o v e l = ⋃ n = 1 K n o v e l { X n , i ^ } i = 1 N ^ n D_{novel} = \bigcup^{K_{novel}}_{n = 1}\{\hat{X_{n,i}}\}^{\hat{N}_n}_{i = 1} Dnovel=n=1⋃Knovel{Xn,i^}i=1N^n
the standard setting for classification neural networks.:
after the extracted the feature vector z z z。
计算每个类别的分类得分 s k s_k sk 去估计分类概率向量:
p = C ( z ∣ W ∗ ) p = C(z|W^{\ast}) p=C(z∣W∗)
k ∈ [ 1 , K ∗ ] k \in [1,K^{\ast}] k∈[1,K∗] 使用点积操作:
s k = z T w k ∗ s_k = z^{T}w^{\ast}_k sk=zTwk∗
在所有 K ∗ K^{\ast} K∗操作上应用softmax scores。
p k = s o f t m a x ( s j ) p_k = softmax(s_j) pk=softmax(sj)
p k p_k pk is the k-th classification probablity of p p p
w K ∗ w_K^{\ast} wK∗could come both from the category
w K ∗ ∈ W b a s e w^{\ast}_K \in W_{base} wK∗∈Wbase
small SGD steps.
问题:the weight values in those
two cases (i.e., base and novel classification weights) can be
completely different, and so the same applies to the raw classification scores computed with the dot-product operation,
which can thus have totally different magnitudes depending
on whether they come from the base or the novel categories.
This can severely impede the training process and, in general,
does not allow to have a unified recognition of both type of
categories.
修改这个分类器 C ( . ∣ W ∗ ) C(.|W^{\ast}) C(.∣W∗) 使用余弦相似操作来计算分类得分:
另外这个以上的修改,我们也选择删除了 t h e R e L U n o n − l i n e a r i t y the ReLU non-linearity theReLUnon−linearity
after the last hidden layer of the feature extractor.
z z z to take both positive and negative values.similar to the classification weight vectors.
l 2 l_2 l2-normalize the feature vectors.
feature averaging mechanism with an attention based
mechanism that composes novel classification weight vectors by looking at a memory that contains the base classification weight vectors.
W b a s e = { w b } b = 1 K b a s e W_{base} = \{w_b\}^{K_{base}}_{b = 1} Wbase={wb}b=1Kbase
(为什么使用基于注意力机制的权重成分)
learn the ConvNet-based recognition model
the feature extractor F ( . ∣ θ ) F(.|\theta) F(.∣θ)
C ( . ∣ W ∗ ) ) C(.|W^{\ast})) C(.∣W∗))
the few-shot classification weight generator G ( . , . ∣ ϕ ) G(.,.|\phi) G(.,.∣ϕ)
单一的输入:
D t r a i n = ⋃ b = 1 K b a s e { x b , i } i = 1 N b D_{train} = \bigcup^{K_{base}}_{b = 1}\{x_{b,i}\}^{N_b}_{i =1} Dtrain=b=1⋃Kbase{xb,i}i=1Nb
of K b a s e K_{base} Kbase base categories.
将训练程序切分为2个阶段:每个阶段:
Evaluation setting for recognition of novel categories.
在 t h e M i n i − I m a g e N e t d a t a s e t the Mini-ImageNet dataset theMini−ImageNetdataset 数据集上评估我们的零样本目标识别系统
(few-shot object recognition system).
想确定这个结构是否有利于最终结果,那就要去掉该结构的网络与加上该结构的网络层的结果进行对比,这就是Ablation study。
在 on the validation set of mini-ImageNet上进行Ablation study。
比较:two prior state-of-the-art approaches :
Prototypical Networks and Matching Nets。
The feature extractor:
*未训练权重生成器情况下, 查看the cosine-simiarity based ConvNet model (entry Consine Classifier)的效果
use both the feature averaging and the attention based mechansim.
cosine-similarity based ConvNet recognition model
the typical dot-product based ConvNet recognition model
the t − S N E t-SNE t−SNE scatter plots
可视化 the local-structures of the feature representations learned in those two cases。
可视化the l 2 l_2 l2-normalized features.这些特征能够被特征提取器所实际用到。
继续下一篇,将其完全的搞明白,全部都将其理解透彻,理解彻底。然后开始搞代码。
针对以零样本为中心开始研究自己的模型及框架,全部都将其搞定都行啦的样子与打算。
*
会自己发现通过啥技术来设计啥目标,以及通过啥如何根据自己的技术来设计优化自己的目标。都行啦的回事与打算。
会根据自己的优化目标,来设计自己的新颖技术。