Dynamic Few-Shot Visual Learning without Forgetting

摘要

人类视觉系统有显著的能力去毫不费力的从零样本示例中学习新颖概念。机器学习视觉系统上模仿相同的行为是一个有趣的非常具有挑战性的研究问题,这些研究问题有许多实际的优势在真实世界视觉应用上。在这篇文章中,我们目标是去设计一个零样本视觉学习系统
(a few-shot visual learning system).
在整个测试阶段,其有能力从一些训练数据中有效的学习新颖类别。于此同时,其将不会遗忘其被训练的初始类别
To achieve that goal we propose (a) to extend an object recognition system with an attention based few-shot classification weight generator, and (b) to redesign the classifier of a ConvNet model as the cosine similarity function between feature representations
and classification weight vectors

摘要提取关键词

The human visual system.
learn novel concepts from only a few examples.

这篇论文的目标:

devise a few-shot visual learning system 其有能力去有效的学习:
learn novel categories from only a few training data while at the same time it will not forget the inital categories.(base categories)

目标实现方法

  • to extend an object recognition system with an attention based few-shot classification weight generator.

  • redesign the classifier of a ConvNet model as the consine similarity function between feature representations and classification weight vectors.

  • unifying the recognition of both novel and base categories.

  • feature representations that generalize better on unseen categories.

  • evaluate our method on Mini-ImageNet

介绍

deep convolutional nerual networks.(ConvNets)
应用在 image classification tasks、object recognition or scene classification.

ConvNet: recognize a set of visual categories.(object categories or scene types)

要求

manually collect and label thousands of training examples per target category and to apply on them an iterative gradient based optimizations ,routine.(迭代梯度)
这种方法计算资源相当昂贵。
而且。the ConvNet model can recognize remains fixed after training.

  • for novel categories to collect training data.
    在曾倩的训练集中,重新开始上述昂贵的训练程序。

  • 我们将会避免灾难性干扰。
    关键点:

  • 有足够的新类别的训练数据至关重要,否则很容易产生过拟合

应用

developing real_time interactive vision application for portable devices.
(portable devices 便携式设备)

许多方法未满足两个非常重要的要求:

  • the learning of the novel categories needs to be fast.
  • to not sacrifice any recognition accuracy on the inital categories.
    that the ConvNet was trained. to not forget。
  • the sole input 唯一输入,
  • 我们想要开发的目标识别学习系统:
    not only to recognize these base categories but also learns to dynamically recognize novel categories from only a few training examples.(provided only at test time), while also not forgetting the base ones or requiring to be re-trained on them
    (dynamic few-shot learning without forgetting)
    为了实现目标,我们提出两个新颖性的技术。

Few-shot classification-weight generator based on attention

  • A typical ConvNet based recoginition model.
  • first extracts a hight level feature representation from it.
  • computes per category classifiction scores by applyiing a set of classification weight vectors to the feature.
  • 为了有能力去识别新颖类别,be able to generaate classification weight vectors for them.

第一个新颖技术的工作

we enhance a typical object recognition system with an extra component.called few-shot classification weight generator :accepts as input a few training examples of a novel category. *

  • generates a classification weight vector. for that novel category.
    其明确要求:利用:the acquired past knowledge about the visual world.by incorporating an attention mechanism over the classifcation weight vectors of the base categories.
    (基类别的权重向量中加入注意力机制)
    注意力机制有有显著的提升效果

Cosine-similarity based ConvNet recognition model.

the ConvNet Model 有能力同时处理 the classification weight vectors of both base and novel categories.

这不是可行的,the typical dot-product based classifier.
(the last linear layer of a classification neural network).

  • 为了克服这些严重的问题,第二个新颖的技术:
    to implement the classifier as a consine similarity funciton between the feature representaions and the classification weight vectors.

验证方式:简单训练一个 by simply training a consine-similarity based ConvNet recognition model.

  • learn feature extractors that when used for image matching they surpass prior state-of-the art approaches .在零样本识别任务上。

贡献

  • 提出了一个 a few shot object recognition sysetm.
    这有能力从小训练数据中动态学习新颖类别,并没有遗忘训练的基类。
  • 实现两个新颖的技术:
    • an attention based few-shot classification weight generator.
    • to implement the classifier of a ConvNet model as a consine similarity function between feature representations and classification vectors.
    • evaluate 目标识别系统在on Mini-ImageNet,

    相关工作

  • Meta-learning based approaches
    • invove a meta-learner model
    • Ravi and Larochelle: a LSTM based meta-learner
    • Finn et a 简化了,a LSTM based meta-learner model
    • Mishra et al :提出了a generic temporal convolutional network.
    • 我们的系统也会包含元学习网络成分,the few-shot classification weight generator
  • Metric-learning based approaches
    • 试图去学习特征表示:preseve the class neighborhood st加粗样式ructure.
    • Koch et al:formulated the one-shot object recognition task as image matching and train Siamese neural networks to compute the similarity between a training example of a novel category and a test example。
    • Vinyals et al:推出 Matching Networks。
    • Mensink et al 提出了一个相似性的方法:Prototypical Networks an adaption of that work for ConvNets。
    • 我们的零样本分类权重迭代器也包括:a feature averaging mechansim.
    • 利用带有注意力机制视觉世界的过去的机制
    • Bharath and Girshick 提出了一个 a l 2 l_2 l2 regularization loss on the feature presentation.使其能够更好的产生不可见类别。
    • the cosine-similarity based classifier

方法

目标识别学习系统的输入:假设存在一个数据集基类
a dataset of K b a s e K_{base} Kbase base categories:
D t r a i n = ⋃ b = 1 K b a s e { x b , i } i = 1 N b D_{train} = \bigcup^{K_{base}}_{b = 1} \{x_{b,i}\}^{N_b}_{i = 1} Dtrain=b=1Kbase{xb,i}i=1Nb
参数解释
N b N_b Nb: the b-th categroy训练例子的数量。
x b , i x_{b,i} xb,i: its i-th training example. 使用这个当做输入。
我们的工作的目标是有能力去准确识别基类和以动态的方式去优化新颖类别的零样本学习。在没有忘记the base ones.
Dynamic Few-Shot Visual Learning without Forgetting_第1张图片
其包含两个主要成分:

  • a ConvNet-based recognition model.
  • a few-shot classification weight generator:在测试时间为新颖类别动态的产生权重向量

ConvNet-based recognition model

  • a feature extractor F ( . ∣ θ ) F(.| \theta) F(.θ) (学习参数 θ \theta θ)
    这能够提取一个 a d-dimensional feature vector:
    z = F ( x ∣ θ ) ∈ R d z = F(x|\theta) \in R^{d} z=F(xθ)Rd 从一个输入图像 x x x

  • 一个分类器 C ( . ∣ W ∗ ) C(.|W^{*}) C(.W):在这里 W ∗ = { w k ∗ ∈ R d } k = 1 K ∗ W^{*} = \{w_k^{*} \in R^d\}^{K^{*}}_{k = 1} W={wkRd}k=1K
    这里是a set of K ∗ K^{*} K classification weight vectors. :one per object category.

  • input the feature representation z z z and return a K ∗ K^{*} K-dimensional vectors.概率分布得分: p = C ( z ∣ W ∗ ) p = C(z|W^{*}) p=C(zW) of the K ∗ K^{*} K
    categories.

  • a typical convolutional neural network the feature extractor:是网路的一部分,starts from the first layer and ends at the last hidden layer.

  • the classifier is the last classification layer.

总计:这个ConvNet model will be able to recognize the base object categories.

Few-shot classification weight generator

这包括一个:a meta-learning mechanism
input: a set of K n o v e l K_{novel} Knovel novel categories.
D n o v e l = ⋃ n = 1 K n o v e l { X n , i ^ } i = 1 N ^ n D_{novel} = \bigcup^{K_{novel}}_{n = 1}\{\hat{X_{n,i}}\}^{\hat{N}_n}_{i = 1} Dnovel=n=1Knovel{Xn,i^}i=1N^n

公式解释

  • N n ^ \hat{N_n} Nn^ the number of training examples of the n-th novel category.
  • x n , i ^ \hat{x_{n,i}} xn,i^:its i-th training example.
    o dynamically assimilate the novel categories on the
    repertoire of the above ConvNet model
  • each novel category n ∈ [ 1 , N n o v e l ] n \in [1, N_{novel}] n[1,Nnovel]
  • the few-shot classification weight generator : G ( . . , . ∣ ϕ ) G(..,.|\phi) G(..,.ϕ)
  • 输入特征向量 Z n ^ = { Z n , i ^ } i = 1 N n ^ \hat{Z_n} = \{\hat{Z_{n,i}}\}^{\hat{N_n}}_{i = 1} Zn^={Zn,i^}i=1Nn^
  • Z n , i ^ = F ( x n , i ∣ θ ^ ) \hat{Z_{n,i}} = F(\hat{x_{n,i} | \theta}) Zn,i^=F(xn,iθ^)
  • 基类的分类权重向量: W b a s e W_{base} Wbase
  • 产生一个分类权重向量: w n ^ = G ( Z n W b a s e ∣ ϕ ^ ) \hat{w_n} = G(\hat{Z_n W_base | \phi}) wn^=G(ZnWbaseϕ^) for novel category.
  • ϕ \phi ϕ是 the learnable parameters of the few-shot weight generator
    这个参数在我们框架的单一训练阶段中学习。
  • W n o v e l = { w n ^ } n = 1 K n o v e l W_{novel} =\{\hat{w_n}\}^{K_{novel}}_{n = 1} Wnovel={wn^}n=1Knovel
  • the few-shot weight generator产生的新颖类别的分类权重向量。
  • W ∗ = W b a s e ⋃ W n o v e l W^{\ast} = W_{base} \bigcup W_{novel} W=WbaseWnovel
  • C ( . ∣ W ∗ ) C(.|W^{\ast}) C(.W):the ConvNet model 去识别基类和新颖类

Cosine-similarity based recognition model

the standard setting for classification neural networks.:

  • after the extracted the feature vector z z z

  • 计算每个类别的分类得分 s k s_k sk 去估计分类概率向量:
    p = C ( z ∣ W ∗ ) p = C(z|W^{\ast}) p=C(zW)

  • k ∈ [ 1 , K ∗ ] k \in [1,K^{\ast}] k[1,K] 使用点积操作:

  • s k = z T w k ∗ s_k = z^{T}w^{\ast}_k sk=zTwk
    在所有 K ∗ K^{\ast} K操作上应用softmax scores。
    p k = s o f t m a x ( s j ) p_k = softmax(s_j) pk=softmax(sj)
    p k p_k pk is the k-th classification probablity of p p p
    w K ∗ w_K^{\ast} wKcould come both from the category
    w K ∗ ∈ W b a s e w^{\ast}_K \in W_{base} wKWbase

  • small SGD steps.

  • 问题:the weight values in those
    two cases (i.e., base and novel classification weights) can be
    completely different, and so the same applies to the raw classification scores computed with the dot-product operation,
    which can thus have totally different magnitudes depending
    on whether they come from the base or the novel categories.
    This can severely impede the training process and, in general,
    does not allow to have a unified recognition of both type of
    categories.

  • 修改这个分类器 C ( . ∣ W ∗ ) C(.|W^{\ast}) C(.W) 使用余弦相似操作来计算分类得分:
    在这里插入图片描述
    Dynamic Few-Shot Visual Learning without Forgetting_第2张图片
    另外这个以上的修改,我们也选择删除了 t h e R e L U n o n − l i n e a r i t y the ReLU non-linearity theReLUnonlinearity
    after the last hidden layer of the feature extractor.
    z z z to take both positive and negative values.similar to the classification weight vectors.

l 2 l_2 l2-normalize the feature vectors.

Advantages of cosine-similarity based classifier

  • o minimize the classification loss of a cosine-similarity based ConvNet model
  • the l 2 l_2 l2 normailized feature vector of an image.
  • low intraclass variance:类内方差。
    Dynamic Few-Shot Visual Learning without Forgetting_第3张图片
  • the cosine-similarity-based ConvNet form more compact and distinctive category-specific clusters

Few-shot classification weight generator

  • the cosine similarity based classifier of the ConvNet model
  • 学习这些类簇的表示特征向量:
    Dynamic Few-Shot Visual Learning without Forgetting_第4张图片
    最终类的权重向量,我们仅使用the feature averaging mechanism
    Dynamic Few-Shot Visual Learning without Forgetting_第5张图片
    ⨀ \bigodot is the the Hadamard product
    ϕ a v g ∈ R d \phi_{avg} \in R^d ϕavgRd is a learnable weight vector.

Attention-based weight inference

feature averaging mechanism with an attention based
mechanism
that composes novel classification weight vectors by looking at a memory that contains the base classification weight vectors.
W b a s e = { w b } b = 1 K b a s e W_{base} = \{w_b\}^{K_{base}}_{b = 1} Wbase={wb}b=1Kbase

  • an extra attention-based classification weight vector w a t t ^ \hat{w_{att}} watt^
    Dynamic Few-Shot Visual Learning without Forgetting_第6张图片
    Dynamic Few-Shot Visual Learning without Forgetting_第7张图片
    在这里插入图片描述
    在这里 ⨀ \bigodot 是the Hadamard product
    ϕ a v g ϕ a t t ∈ R d \phi_{avg} \phi_{att} \in R^d ϕavgϕattRd 是可学习的权重向量。

Why using an attention-based weight composition?

(为什么使用基于注意力机制的权重成分)

  • the cosine-similarity based classifier
  • the base classification weight vectors
  • the base classification weight vectors also encode visual similarity

训练步骤

  • learn the ConvNet-based recognition model

  • the feature extractor F ( . ∣ θ ) F(.|\theta) F(.θ)

  • C ( . ∣ W ∗ ) ) C(.|W^{\ast})) C(.W))

  • the few-shot classification weight generator G ( . , . ∣ ϕ ) G(.,.|\phi) G(.,.ϕ)

  • 单一的输入:
    D t r a i n = ⋃ b = 1 K b a s e { x b , i } i = 1 N b D_{train} = \bigcup^{K_{base}}_{b = 1}\{x_{b,i}\}^{N_b}_{i =1} Dtrain=b=1Kbase{xb,i}i=1Nb
    of K b a s e K_{base} Kbase base categories.

  • 将训练程序切分为2个阶段:每个阶段:

  • minimize a different cross-entropy loss
    Dynamic Few-Shot Visual Learning without Forgetting_第8张图片
    Dynamic Few-Shot Visual Learning without Forgetting_第9张图片

1st training stage

  • learn the ConvNet recognition model without the few-shot classification weight generator.
  • 特征提取器的参数 θ \theta θ F ( . ∣ θ ) F(.|\theta) F(.θ)
  • the base classification weight vectors: W b a s e = { w b } b = 1 K b a s e W_{base} = \{w_b\}^{K_{base}}_{b = 1} Wbase={wb}b=1Kbase
  • W ∗ 等 价 于 W b a s e W^{\ast}等价于W_{base} WWbase

2nd training stage

Dynamic Few-Shot Visual Learning without Forgetting_第10张图片
Dynamic Few-Shot Visual Learning without Forgetting_第11张图片

实验结果

Mini-ImageNet experiments

Evaluation setting for recognition of novel categories.
t h e M i n i − I m a g e N e t d a t a s e t the Mini-ImageNet dataset theMiniImageNetdataset 数据集上评估我们的零样本目标识别系统
(few-shot object recognition system).

数据集介绍

  • dataset that includes 100 different categories,with 600 images per category.
  • each of size 84 × 84 84 \times 84 84×84
  • 64 categories for training.
  • 16 categories for validation.
  • 20 categories for testing.
  • first sampling K n o v e l K_{novel} Knovel categories and one or five training example per category1-(1-shot and 5-shot settings respectively)

Evaluation setting for the recognition of the base categories.

  • the proposed few-shot object recognition system has the ability to not forget the base categoris.
  • 目标
  • evaluate the recognition performance of our model on those base categories

措施

  • **sampled 300 extra images for each training categories.**that we use as validation image set for the evaluation of the recognition performance of the base categories.
  • and also anthor 300 extra images that are used for the same reason as test image set.

Ablation study

想确定这个结构是否有利于最终结果,那就要去掉该结构的网络与加上该结构的网络层的结果进行对比,这就是Ablation study。
Dynamic Few-Shot Visual Learning without Forgetting_第12张图片
Dynamic Few-Shot Visual Learning without Forgetting_第13张图片

on the validation set of mini-ImageNet上进行Ablation study。
比较:two prior state-of-the-art approaches :
Prototypical Networks and Matching Nets

  • The feature extractor:

    • is a ConvNet model that has 4 convolutional modules
    • with 3 × 3 3 \times 3 3×3 convolutions.
    • followed by batch normalization
    • ReLU nonlinearity.
    • 2 × 2 2 \times 2 2×2 max-pooling
    • Given as input images of size 84 × 84 84 \times 84 84×84
    • it yields feature maps with spiatal size 5 × 5 5 \times 5 5×5
    • The first two convolutional layers have 64 feature channels.
    • the latter two have 128 feature channels.

    Cosine-similarity based ConvNet model.

    *未训练权重生成器情况下, 查看the cosine-simiarity based ConvNet model (entry Consine Classifier)的效果

    • perform the 1st training stage
    • 为了测试新颖类别的效果:
    • estimate classification weight vectors.using feature averaging.
      *** the cosine-similarity based ConvNet models** (Cosine Classifier entries
    • ConvNet models:模型中引入余弦相似度和dot-product做对比,比较其优劣。

    Removing the last ReLU unit

    • remove the last ReLU non-linearity from the feature extractor.when using a cosine classifier.

    Few-shot classification weight generator

    use both the feature averaging and the attention based mechansim.

    Comparison with state-of-the-art

    Explored feature extractor architectures

    • a 4 module ConvNet network
    • the ResNet [5] like network

    Qualitative evaluation with t-SNE scatter plots

    • cosine-similarity based ConvNet recognition model

    • the typical dot-product based ConvNet recognition model

    • the t − S N E t-SNE tSNE scatter plots
      可视化 the local-structures of the feature representations learned in those two cases。

    • 可视化the l 2 l_2 l2-normalized features.这些特征能够被特征提取器所实际用到。

    Few-shot benchmark of Bharath & Girshick

    • on the ImageNet based few-shot benchmark
    • 将数据集切分成389 base categories. and 611 novel categories.
    • 193 of the base categories and 300 of the novel categories are used for cross validation and the remaining 196 base categories and 311 novel categories are used for the final evaluation.
    • sampling ourselves N ^ \hat{N} N^ training images per novel category.
    • evaluate using the images in the validation set of ImageNet
    • repeat the above experiment 100 times

    Comparison to prior and concurrent work

    • Prototypical-Nets
    • Matching Networks
    • the work of Bharath and Girshick。

    Feature extractor

    • a ResNet-10 [5] network architecture
      gets as input images of 224 × 224 224 \times 224 224×224 size>
    • apply dropout with 0.5 probability on the feature vectors generated by the feature extractor.

结论

  • a cosine-similarity based ConvNet classifier.
  • learn feature representation with better generalization capabilities.

概括

继续下一篇,将其完全的搞明白,全部都将其理解透彻,理解彻底。然后开始搞代码。
针对以零样本为中心开始研究自己的模型及框架,全部都将其搞定都行啦的样子与打算。
*

总结

会自己发现通过啥技术来设计啥目标,以及通过啥如何根据自己的技术来设计优化自己的目标。都行啦的回事与打算。
会根据自己的优化目标,来设计自己的新颖技术。

  • 会判断什么技术可行不可行。会自己的判断都行啦的样子与打算。
  • 会自己比较方法, d o t − p r o d u c t dot-product dotproduct可行还是 t h e c o s i n e − s i m i l a r i t y the cosine-similarity thecosinesimilarity
  • 会自己高明白,啥操作产生啥效果,全部狗将其搞定都行啦的理由与打算。
    • 会自己根据运行代码,并结合论文来进行观察与查看,整个复现代码,好好的将其研究透彻,研究彻底都行的样子与打算。慢慢的将其搞定都行啦的理由与障碍。
  • 会将两个网络一起训练,然后将其做对比。比较其模型效果的优劣。

你可能感兴趣的:(模块复现,深度学习,人工智能)