【论文总结】Few-shot Object Detection via Feature Reweighting(附翻译)

Few-shot Object Detection via Feature Reweighting基于特征重加权的小样本目标检测

论文地址:https://arxiv.org/abs/1812.01866

代码地址:https:// github.com/bingykang/Fewshot_Detection

【论文总结】Few-shot Object Detection via Feature Reweighting(附翻译)_第1张图片

小样本学习训练步骤:
①基于base类(大量有标注数据)的训练
②将base类和小样本的novel类进行融合。在base+novel类上进行最后的训练:因为novel类的标签很少,通常只有n个(n-shot),所以为了类别平衡,相对应的也只是使用base类中每个类别的n个标签
网络结构:
①feature extractor:输入Query images,使用骨干网络DarkNet来提取元特征w×h×m
②reweighting module: 使用从base类中提取的属性特征对待检测图片特征进行调整。
输入:从训练集中随机抽取N个类中的一张图片组成support set, 将每张图片中的标签信息转为mask,然后将mask拼接到RGB图像中去,最后形成一个w×h×4的输入
输入到卷积层提取属性信息,每一个类的对应图片会卷积为一个m维的向量。
使用待检测图片的属性向量对所有类的属性向量进行调整,将前者作为1×1 depth-wise convolution 卷积核的权重来卷积后者,最后得到待检测图片使用不同类别调整过后的特征。
③predict layer: 当有N个新的类别的时候,Reweighting module会产生N个Reweighting vectors, 每一个都负责检测一个新的类别,这些权重向量是主要是做分类用的,之后再送入Prediction Layer进行分类和回归。
 


摘要

Conventional training of a deep CNN based object detector demands a large number of bounding box annotations, which may be unavailable for rare categories. In this work we develop a few-shot object detector that can learn to detect novel objects from only a few annotated examples. Our proposed model leverages fully labeled base classes and quickly adapts to novel classes, using a meta feature learner and a reweighting module within a one-stage detection architecture. The feature learner extracts meta features that are generalizable to detect novel object classes, using training data from base classes with sufficient samples. The reweighting module transforms a few support examples from the novel classes to a global vector that indicates the importance or relevance of meta features for detecting the corresponding objects. These two modules, together with a detection prediction module, are trained end-to-end based on an episodic few-shot learning scheme and a carefully designed loss function. Through extensive experiments we demonstrate that our model outperforms well-established baselines by a large margin for few-shot object detection, on multiple datasets and settings. We also present analysis on various aspects of our proposed model, aiming to provide some inspiration for future few-shot detection works.

基于深度CNN的对象检测器的常规训练需要大量的边界框注释,这可能不适用于目标数较少的类别。在这项工作中,我们开发了小样本目标检测器,它可以学习仅从几个标记示例中检测新目标。我们提出的模型利用完全标记的基类,并在一个单阶段检测体系结构中使用元特征学习器和重新加权模块,快速适应新的类。特征提取部分使用具有大量样本的基类中的训练数据,提取可概括的元特征以检测新的目标类。重新加权模块将新类中的一些支持样本转换为一个全局向量,该向量表示检测相应对象的元特征的重要性或相关性。这两个模块,连同一个检测预测模块,都是基于场景式的小样本学习方案和精心设计的损失函数进行端到端训练的。通过大量的实验,我们证明了我们的模型在多个数据集和设置下,在小样本目标检测方面大大优于成熟的baseline。我们还对我们提出的模型的各个方面进行了分析,旨在为今后的小样本检测工作提供一些启示。

1  介绍

The recent success of deep convolutional neural networks (CNNs) in object detection [32, 15, 30, 31] relies heavily on a huge amount of training data with accurate bounding box annotations. When the labeled data are scarce, CNNs can severely overfit and fail to generalize. In contrast, humans exhibit strong performance in such tasks: children can learn to detect a novel object quickly from very few given examples. Such ability of learning to detect from few examples is also desired for computer vision systems, since some object categories naturally have scarce examples or their annotations are hard to obtain, e.g., California firetrucks, endangered animals or certain medical data [33].

深度卷积神经网络(CNN)最近在目标检测方面的成功[32,15,30,31]在很大程度上依赖于具有精确边界框注释的大量训练数据。当标记的数据稀少时,CNN可能严重过度拟合,无法广泛应用。相比之下,人类在这类任务中表现出很强的表现力:孩子们可以从很少的给定例子中快速学会检测新物体。计算机视觉系统也需要学习从少数示例中进行检测的能力,因为某些对象类别自然具有稀少的示例或其注释难以获得,例如加利福尼亚消防车、濒危动物或某些医疗数据[33]。

In this work, we target at the challenging few-shot object detection problem, as shown in Fig. 1. Specifically, given some base classes with sufficient examples and some novel classes with only a few samples, we aim to obtain a model that can detect both base and novel objects at test time. Obtaining such a few-shot detection model would be useful for many applications. Yet, effective methods are still absent. Recently, meta learning [39, 35, 12] offers promising solutions to a similar problem, i.e., few-shot classification. However, object detection is by nature much more difficult as it involves not only class predictions but also localization of the objects, thus off-the shelf few-shot classification methods cannot be directly applied on the few-shot detection problem. Taking Matching Networks [39] and Prototypical Networks [35] as examples, it is unclear how to build object prototypes for matching and localization, because there may be distracting objects of irrelevant classes within the image or no targeted objects at all.

在这项工作中,我们针对具有挑战性的小样本目标检测问题,如图1所示。具体地说,给定一些具有足够示例的基类和一些仅具有少量示例的新类,我们的目标是获得一个能够在测试时同时检测基类和新对象的模型。获得这样的少量镜头检测模型将有助于许多应用。然而,仍然缺乏有效的方法。最近,元学习[39,35,12]为一个类似的问题提供了有希望的解决方案,即小样本分类。然而,目标检测本质上要困难得多,因为它不仅涉及类别预测,而且还涉及目标的定位,因此现有的小样本分类方法无法直接应用于小样本检测问题。以匹配网络[39]和原型网络[35]为例,不清楚如何构建用于匹配和定位的对象原型,因为图像中可能存在无关类别的分散注意力的对象,或者根本没有目标对象。

【论文总结】Few-shot Object Detection via Feature Reweighting(附翻译)_第2张图片

 Figure 1: We aim to obtain a few-shot detection model by training on the base classes with sufficient examples, such that the model can learn from a few annotated examples to detect novel objects on testing images.  我们的目标是通过使用足够的示例在基类上进行训练来获得小样本检测模型,以便该模型可以从一些标记示例中学习,以检测测试图像上的新对象。

We propose a novel detection model that offers few-shot learning ability through fully exploiting detection training data from some base classes and quickly adapting the detection prediction network to predict novel classes according to a few support examples. The proposed model first learns meta features from base classes that are generalizable to the detection of different object classes. Then it effectively utilizes a few support examples to identify the meta features that are important and discriminative for detecting novel classes, and adapts accordingly to transfer detection knowledge from the base classes to the novel ones.

我们提出了一种新的检测模型,通过充分利用来自一些基类的检测训练数据,并根据几个支持示例快速调整检测预测网络来预测新类,从而提供小样本学习能力。该模型首先从基类中学习元特征,这些基类可推广到不同目标类的检测。然后,它有效地利用一些支持示例来识别对检测新类重要且有区别的元特征,并相应地进行调整,以将检测知识从基类转移到新类。

Our proposed model thus introduces a novel detection framework containing two modules, i.e., a meta feature learner and a light-weight feature reweighting module. Given a query image and a few support images for novel classes, the feature learner extracts meta features from the query image. The reweighting module learns to capture global features of the support images and embeds them into reweighting coefficients to modulate the query image meta features. As such, the query meta features effectively receive the support information and are adapted to be suitable for novel object detection. Then the adapted meta features are fed into a detection prediction module to predict classes and bounding boxes for novel objects in the query (Fig. 2). In particular, if there are N novel classes to detect, the reweighting module would take in N classes of support examples and transform them into N reweighting vectors, each responsible for detecting novel objects from the corresponding class. With such class-specific reweighting vectors, some important and discriminative meta features for a novel class would be identified and contribute more to the detection decision, and the whole detection framework can learn to detect novel classes efficiently.

因此,我们提出的模型引入了一种新的检测框架,该框架包含两个模块,即元特征学习器和轻量级特征重加权模块。给定一个query图像和一些新类的支持图像,特征学习器从查询图像中提取元特征。重新加权模块学习捕获支持图像的全局特征,并将其嵌入到重加权系数中,以调整query图像的元特征。因此,query元特征有效地接收支持信息,并且适于新的目标检测。然后,将调整后的元特征输入检测预测模块,以预测query中新目标的类和边界框(图2)。特别地,如果有N个新类要检测,则重新加权模块将接收N个支持示例类并将其转换为N个重新加权向量,每个重新加权向量负责从相应类检测新对象。通过这种特定于类的重加权向量,可以识别新类的一些重要的、有区别的元特征,并对检测决策做出更大的贡献,整个检测框架可以学习如何有效地检测新类。

【论文总结】Few-shot Object Detection via Feature Reweighting(附翻译)_第3张图片

Figure 2: The architecture of our proposed few-shot detection model. It consists of a meta feature extractor and a reweighting module. The feature extractor follows the one-stage detector architecture and directly regresses the objectness score (o), bounding box location (x; y; h; w) and classification score (c). The reweighting module is trained to map support samples of N classes to N reweighting vectors, each responsible for modulating the meta features to detect the objects from the corresponding class. A softmax based classification score normalization is imposed on the final output.  我们提出的小样本检测模型的体系结构。它由一个元特征提取器和一个重新加权模块组成。特征提取器遵循一级检测器体系结构,直接回归对象性得分(o)、边界框位置(x;y;h;w)和分类得分(c)。重新加权模块被训练为将N个类的支持样本映射到N个重新加权向量,每个重新加权向量负责调制元特征以检测来自相应类的对象。在最终输出上施加基于softmax的分类分数归一化。

The meta feature learner and the reweighting module are trained together with the detection prediction module end-to-end. To ensure few-shot generalization ability, the whole few-shot detection model is trained using an twophase learning scheme: first learn meta features and good reweighting module from base classes; then fine-tune the detection model to adapt to novel classes. For handling difficulties in detection learning (e.g., existence of distracting objects), it introduces a carefully designed loss function.

元特征学习器和重加权模块与检测预测模块一起端到端地训练。为了保证小样本的泛化能力,采用两阶段学习方法对整个小样本检测模型进行训练:首先从基类中学习元特征和良好的重加权模块;然后微调检测模型以适应新类。为了处理检测学习中的困难(例如,存在分散注意力的物体),它引入了一个精心设计的损失函数。

Our proposed few-shot detector outperforms competitive baseline methods on multiple datasets and in various settings. Besides, it also demonstrates good transferability from one dataset to another different one. Our contributions can be summarized as follows:

我们提出的小样本检测器在多个数据集和各种设置下优于竞争性baseline方法。此外,它还展示了从一个数据集到另一个不同数据集的良好可转移性。我们的贡献可以总结如下:

• We are among the first to study the problem of fewshot object detection, which is of great practical values but a less explored task than image classification in the few-shot learning literature.
• We design a novel few-shot detection model that 1) learns generalizable meta features; and 2) automatically reweights the features for novel class detection by producing class-specific activating coefficients from a few support samples.
• We experimentally show that our model outperforms baseline methods by a large margin, especially when the number of labels is extremely low. Our model adapts to novel classes significantly faster.

•我们是最早研究小样本目标检测问题的公司之一,该问题具有很大的实用价值,但与小样本学习文献中的图像分类相比,这是一项探索较少的任务。
•我们设计了一个新的小样本检测模型,该模型1)学习可概括的元特征;2)通过从几个支持样本生成特定于类的激活系数,自动重新加权新类检测的特征。
•我们的实验表明,我们的模型在很大程度上优于baseline方法,尤其是在标签数量极少的情况下。我们的模型能够更快地适应新类。

2  相关工作

General object detection.    Deep CNN based object detectors can be divided into two categories: proposal-based and proposal-free. RCNN series [15, 14, 32] detectors fall into the first category. RCNN [15] uses pre-trained CNNs to classify the region proposals generated by selective search [38]. SPP-Net [17] and Fast-RCNN [14] improve RCNN with an RoI pooling layer to extract regional features from the convolutional feature maps directly. Faster-RCNN [32] introduces a region-proposalnetwork (RPN) to improve the efficiency of generating proposals. In contrast, YOLO [29] provides a proposalfree framework, which uses a single convolutional network to directly perform class and bounding box predictions. SSD [22] improves YOLO by using default boxes (anchors) to adjust to various object shapes. YOLOv2 [30] improves YOLO with a series of techniques, e.g., multi-scale training, new network architecture (DarkNet-19). Compared with proposal-based methods, proposal-free methods do not require a per-region classifier, thus are conceptually simpler and significantly faster. Our few-shot detector is built on the YOLOv2 architecture.

一般目标检测    基于深度CNN的目标检测器可分为两类:proposal-based的和proposal-free。RCNN系列[15、14、32]探测器属于第一类。RCNN[15]使用预先训练的CNN对选择性搜索生成的区域生成框进行分类[38]。SPP Net[17]和Fast RCNN[14]使用RoI池层改进RCNN,以直接从卷积特征图中提取区域特征。Faster RCNN[32]引入了区域建议网络(RPN),以提高生成建议框的效率。相比之下,YOLO[29]提供了一个proposal-free框架,该框架使用单个卷积网络直接执行类和边界框预测。SSD[22]通过使用默认框(锚)来调整各种对象形状,从而改进了YOLO。YOLOv2[30]通过一系列技术改进了YOLO,例如多尺度训练、新网络架构(DarkNet-19)。与proposal-based的方法相比,proposal-free的方法不需要每个区域的分类器,因此在概念上更简单,速度也更快。我们的小样本目标检测器构建在YOLOv2架构上。

Few-shot learning.    Few-shot learning refers to learning from just a few training examples per class. Li et al. [20] use Bayesian inference to generalize knowledge from a pretrained model to perform one-shot learning. Lake et al. [19] propose a Hierarchical Bayesian one-shot learning system that exploits compositionality and causality. Luo et al. [23] consider the problem of adapting to novel classes in a new domain. Douze et al. [9] assume abundant unlabeled images and adopts label propagation in a semi-supervised setting.

小样本学习   小样本学习指的是每个类只从几个训练样本中学习。Li等人[20]使用贝叶斯推理从预训练模型中概括知识,以执行一次性学习。Lake等人[19]提出了一种利用组合性和因果关系的分层贝叶斯一次性学习系统。罗等人〔23〕考虑在新领域中适应新类的问题。Douze等人[9]假设存在大量未标记图像,并在半监督环境中采用标签传播。

An increasingly popular solution for few-shot learning is meta-learning, which can further be divided into three categories: a) Metric learning based [18, 37, 39, 35]. In particular, Matching Networks [39] learn the task of finding the most similar class for the target image among a small set of labeled images. Prototypical Networks [35] extend Matching Networks by producing a linear classifier instead of weighted nearest neighbor for each class. Relation Networks [37] learn a distance metric to compare the target image to a few labeled images. b) Optimization for fast adaptation. Ravi and Larochelle [28] propose an LSTM metalearner that is trained to quickly converge a learner classifier in new few-shot tasks. Model-Agnostic Meta-Learning (MAML) [12] optimizes a task-agnostic network so that a few gradient updates on its parameters would lead to good performance on new few-shot tasks. c) Parameter prediction. Learnet [2] dynamically learns the parameters of factorized weight layers based on a single example of each class to realize one-shot learning.

小样本学习越来越流行的解决方案是元学习,它可以进一步分为三类:a)基于度量的学习[18,37,39,35]。特别是,匹配网络[39]学习在一小组标记图像中为目标图像找到最相似类别的任务。原型网络[35]通过为每个类别生成线性分类器而不是加权最近邻来扩展匹配网络。关系网络[37]学习距离度量,将目标图像与几个标记图像进行比较。b) 优化快速适应。Ravi和Larochelle[28]提出了一种LSTM metalearner,该metalearner经过训练,能够在新的小样本任务中快速聚合学习者分类器。模型不可知元学习(MAML)[12]优化了任务不可知网络,使其参数上的一些梯度更新能够在新的少量任务上获得良好的性能。c) 参数预测。Learnet[2]基于每个类的单个示例动态学习分解权重层的参数,以实现一次性学习。

Above methods are developed to recognize novel images only, there are some other works tried to learn a model that can classify both base and novel images. Recent works by Hariharan et al. [16, 40] introduce image hallucination techniques to augment the novel training data such that novel classes and base classes are balanced to some extend. Weight imprinting [26] sets weights for a new category using a scaled embedding of labeled examples. DynamicNet [13] learns a weight generator to classification weights for a specific category given the corresponding labeled images. These previous works only tackle image classification task, while our work focuses on object detection.

上述方法仅用于识别新图像,还有其他一些工作试图学习一种既能对基础图像进行分类又能对新图像进行分类的模型。Hariharan等人[16,40]最近的工作引入了图像幻觉技术,以增强新的训练数据,从而在一定程度上平衡了新类和基类。权重印记[26]使用标记示例的缩放嵌入设置新类别的权重。DynamicNet[13]学习一个权重生成器,为给定相应标签图像的特定类别分类权重。以前的工作只处理图像分类任务,而我们的工作主要集中在目标检测上。

Object detection with limited labels.    There are a number of prior works on detection focusing on settings with limited labels. The weakly-supervised setting [3, 7, 36] considers the problem of training object detectors with only image-level labels, but without bounding box annotations, which are more expensive to obtain. Few example object detection [25, 41, 8] assumes only a few labeled bounding boxes per class, but relies on abundant unlabeled images to generate trustworthy pseudo annotations for training. Zeroshot object detection [1, 27, 42] aims to detect previously unseen object categories, thus usually requires external information such as relations between classes. Different from these settings, our few-shot detector uses very few bounding box annotations (1-10) for each novel class, without the need for unlabeled images or external knowledge. Chen et al. [4] study a similar setting but only in a transfer learning context, where the target domain images only contains novel classes without base classes.

有限标签的目标检测    之前有很多关于检测的工作,主要集中在有限标签的设置上。弱监督设置[3,7,36]考虑了仅使用图像级标签,但不使用边界框注释来训练对象检测器的问题,而边界框注释的获取成本更高。很少有目标检测示例[25,41,8]假设每个类只有几个带标签的边界框,但依赖大量未标记的图像生成可信的伪标签进行训练。Zeroshot对象检测[1,27,42]旨在检测以前未看到的目标类别,因此通常需要外部信息,如类之间的关系。与这些设置不同,我们的小样本检测器对每个新类使用很少的边界框标记1-10),而不需要未标记的图像或外部知识。Chen等人[4]研究了类似的设置,但仅在迁移学习环境中进行,其中目标域图像只包含新类,没有基类。

3  方法

In this work, we define a novel and realistic setting for few-shot object detection, in which there are two kinds of data available for training, i.e., the base classes and the novel classes. For the base classes, abundant annotated data are available, while only a few labeled samples are given to the novel classes [16]. We aim to obtain a few-shot detection model that can learn to detect novel object when there are both base and novel classes in testing by leveraging knowledge from the base classes.

在这项工作中,我们定义了一种新颖而真实的小样本目标检测设置,其中有两种数据可用于训练,即基类和新类。对于基类,有丰富的注释数据可用,而只有少数标记样本提供给新类[16]。我们的目标是获得小样本检测模型,当测试中同时存在基类和新类时,该模型可以利用基类中的知识来学习检测新对象。

This setting is worth exploring since it aligns well with a practical situation—one may expect to deploy a pre-trained detector for new classes with only a few labeled samples. More specifically, large-scale object detection datasets (e.g., PSACAL VOC, MSCOCO) are available to pre-train a detection model. However, the number of object categories therein is quite limited, especially compared to the vast object categories in real world. Thus, solving this few-shot object detection problem is heavily desired.

此设置值得探索,因为它与实际情况非常吻合,人们可能希望为只有少量标记样本的新类部署一个预先训练的检测器。更具体地说,可以使用大规模目标检测数据集(例如,PSCAL VOC、MSCOCO)来预训练检测模型。然而,其中的对象类别的数量非常有限,特别是与现实世界中庞大的对象类别相比。因此,解决这小样本的目标检测问题是迫切需要的。

3.1. Feature Reweighting for Detection 用于检测的特征重加权

Our proposed few-shot detection model introduces a meta feature learner D and a reweighting module M into a one-stage detection framework. In this work, we adopt the proposal-free detection framework YOLOv2 [30]. It directly regresses features for each anchor to detection relevant outputs including classification score and object bounding box coordinates through a detection prediction module P. As shown in Fig. 2, we adopt the backbone of YOLOv2 (i.e., DarkNet-19) to implement the meta feature extractor D, and follow the same anchor setting as YOLOv2. As for the reweighting module M, we carefully design it to be a light-weight CNN for both enhancing efficiency and easing its learning. Its architecture details are deferred to the supplementary due to space limit.

我们提出的小样本检测模型将元特征学习器D和重加权模块M引入到一个单阶段检测框架中。在这项工作中,我们采用了proposal-free检测框架YOLOv2[30]。它通过检测预测模块P将每个锚的特征直接回归到检测相关输出,包括分类分数和目标边界框坐标。如图2所示,我们采用YOLOv2的主干(即DarkNet-19)来实现元特征提取器D,并遵循与YOLOv2相同的锚设置。至于重新称重模块M,我们精心设计成一个轻量级的CNN,以提高效率和简化学习。由于空间限制,其架构细节推迟到补充部分。

The meta feature learner D learns how to extract meta features for the input query images to detect their novel ob jects. The reweighting module M, taking the support examples as input, learns to embed support information into reweighting vectors and adjust contribution of each meta feature of the query image accordingly for following detection prediction module P. With the reweighting module , some meta features informative for detecting novel objects would be excited and thus assist detection prediction. 

元特征学习器D学习如何为输入query图像提取元特征,以检测其新目标。重加权模块M以支持示例为输入,学习将支持信息嵌入到加权向量中,并相应地调整query图像的每个元特征的贡献度,以用于后续检测预测模块P,一些用于检测新物体的元特征将被激发,从而有助于目标预测。

Formally, let I denote an input query image. Its corresponding meta features are generated by D: F = D(I). The produced meta feature has m feature maps. We denote the support images and their associated bounding box annotation, indicating the target class to detect, as Ii and Mi respectively, for class i; i = 1;.... ; N. The reweighting module M takes one support image (Ii; Mi) as input and embed it into a class-specific representation wi ∈ Rm with wi = M(Ii; Mi). Such embedding captures global representation of the target object w.r.t. the m meta features. It will be responsible for reweighting the meta features and highlighting more important and relevant ones to detect the target object from class i. More specifically, after obtaining the class-specific reweighting coefficients wi, our model applies it to obtain the class-specific feature Fi for novel class i by:

​形式上,I 表示一个输入query图像。其对应的元特征由D:F=D(I)生成。生成的元特征有m个特征映射。我们标记支持图像及其关联的边界框标签,指定要检测的目标类,分别为Ii和Mi;i=1......N。重新加权模块M将一个支持映像(Ii;Mi)作为输入,并将其嵌入特定于类的表示wi∈Rm,wi=M(Ii;Mi)。这种嵌入捕获了目标对象的全局表示,即m个元特征。它将负责重新加权元特征,并突出显示更重要和相关的特征,以检测i类中的目标对象。更具体地说,在获得类别特定的重新加权系数wi后,我们的模型通过以下方式应用它来获得新类别i的类别特定特征Fi:

where ⊗ denotes channel-wise multiplication. We implement it through 1×1 depth-wise convolution.

其中⊗ 表示按通道乘法。我们通过1×1深度卷积来实现它。

After acquiring class-specific features Fi, we feed them into the prediction module P to regress the objectness score o, bounding box location offsets (x; y; h; w), and classification score ci for each of a set of predefined anchors:

在获取特定于类的特征Fi后,我们将其输入预测模块P,以回归一组预定义锚的对象性得分o、边界框位置偏移(x;y;h;w)和分类得分ci:

where ci is one-versus-all classification score indicating the probability of the corresponding object belongs to class i.

其中,ci是一对所有分类分数,表明对应对象属于类别i的概率。

3.2. Learning Scheme 学习计划

It is not straightforward to learn a good meta feature learner D and reweighting module M from the base classes such that they can produce generalizable meta features and rweighting coefficients. To ensure the model generalization performance from few examples, we develop a new two-phase learning scheme that is different from the conventional ones for detection model training.

学习一个好的元特征学习器D并从基类中重新加权模块M,以便生成可推广的元特征和加权系数,这并不容易。为了保证模型的泛化性能,我们提出了一种新的两阶段学习方法,不同于传统的检测模型训练方法。

We reorganize the training images with annotations from the base classes into multiple few-shot detection learning tasks Tj. Each task Tj = Sj ∪ Qj = ​ contains a support set Sj (consisting of N support images each of which is from a different base class) and a query set Qj (offering query images with annotations for performance evaluation).

​我们将带有注释的训练图像从基类重新组织为多个小样本检测学习任务Tj。每个任务Tj=Sj∪Qj=​包含一个支持集Sj(由N个支持映像组成,每个支持映像来自不同的基类)和一个query集Qj(为性能评估提供带注释的query映像)。

​Let θD, θM and θP denote the parameters of meta feature learner D, the reweighting module M and prediction module P respectively. We optimize them jointly through minimizing the following loss:

​设θD、θM和θP分别表示元特征学习器D、重加权模块M和预测模块P的参数。我们通过最大限度地减少以下损失来共同优化它们:

Here Ldet is the detection loss function and we explain its details later. The above optimization ensures the model to learn good meta features for the query and reweighting coefficients for the support.

这里Ldet是检测损失函数,稍后我们将解释其细节。上述优化确保了模型为query学习良好的元特性,并为支持重新加权系数。

The overall learning procedure consists of two phases. The first phase is the base training phase. In this phase, despite abundant labels are available for each base class, we still jointly train the feature learner, detection prediction together with the reweighting module . This is to make them coordinate in a desired way: the model needs to learn to detect objects of interest by referring to a good reweighting vector. The second phase is few-shot fine-tuning. In this phase, we train the model on both base and novel classes. As only k labeled bounding boxes are available for the novel classes, to balance between samples from the base and novel classes, we also include k boxes for each base class. The training procedure is the same as the first phase, except that it takes significantly fewer iterations for the model to converge.

整个学习过程包括两个阶段。第一阶段是基类训练阶段。在这个阶段,尽管每个基类都有丰富的标签可用,但我们仍然联合训练特征学习者、检测预测和重新加权模块。这是为了使它们以期望的方式协调:模型需要学习通过参考良好的重新加权向量来检测感兴趣的对象。第二阶段是小目标微调。在这一阶段,我们在基本类和新类上训练模型。由于只有k个标记的边界框可用于新类,为了平衡基类和新类的样本,我们还为每个基类包含k个框。训练过程与第一阶段相同,只是模型收敛所需的迭代次数明显减少。

In both training phases, the reweighting coefficients depend on the input pairs of (support image, bounding box) that are randomly sampled from the available data per iteration. After few-shot fine-tuning, we would like to obtain a detection model that can directly perform detection without requiring any support input. This is achieved by setting the reweighting vector for a target class to the average one predicted by the model after taking the k-shot samples as input. After this, the reweighting module can be completely removed during inference. Therefore, our model adds negligible extra model parameters to the original detector.

在两个训练阶段中,重新加权系数取决于从每次迭代的可用数据中随机采样的输入对(支持图像、边界框)。经过小样本微调后,我们希望获得一个检测模型,该模型可以直接执行检测,而无需任何支持输入。这是通过将目标类的重新加权向量设置为模型在将k-shot样本作为输入后预测的平均向量来实现的。在此之后,可以在推理过程中完全移除重加权模块。因此,我们的模型在原始检测器上增加了可忽略不计的额外模型参数。

Detection loss function. To train the few-shot detection model, we need to carefully choose the loss functions in particular for the class prediction branch, as the sample number is very few. Given that the predictions are made classwisely, it seems natural to use binary cross-entropy loss, regressing 1 if the object is the target class and 0 otherwise. However, we found using this loss function gave a model prone to outputting redundant detection results (e.g., detecting a train as a bus and a car). This is due to that for a specific region of interest, only one out of N classes is truly positive. However, the binary loss strives to produce balanced positive and negative predictions. Non-maximum suppression could not help remove such false positives as it only operates on predictions within each class. 

检测损失函数    为了训练小样本检测模型,我们需要仔细选择损失函数,特别是对于类预测分支,因为样本数很少。假设预测是按类进行的,使用二进制交叉熵损失似乎很自然,如果对象是目标类,则回归1,否则回归0。然而,我们发现,使用这种损失函数,模型容易输出冗余检测结果(例如,将火车检测为公共汽车和汽车)。这是因为对于一个特定的感兴趣区域,N个类中只有一个是真正正的。然而,binary loss想要产生平衡的正面和负面预测。非最大值抑制不能帮助消除此类误报,因为它只对每个类中的预测进行操作。

To resolve this issue, our proposed model adopts a softmax layer for calibrating the classification scores among different classes, and adaptively lower detection scores for the wrong classes. Therefore, the actual classification score for the i-th class is given by   . Then to better align training procedure and few-shot detection, the crossentropy loss over the calibrated scores c^i is adopted:

​为了解决这个问题,我们提出的模型采用softmax层来校准不同类别之间的分类分数,并自适应地降低错误类别的检测分数。因此,第i类的实际分类分数由给出。然后,为了更好地对齐训练程序和小样本检测,采用了校准分数c^i上的交叉熵损失:

where 1(·, i) is an indicator function for whether current anchor box really belongs to class i or not. After introducing softmax, the summation of classification scores for a specific anchor is equal to 1, and less probable class predictions will be suppressed. This softmax loss will be shown to be superior to binary loss in the following experiments. For bounding box and objectiveness regression, we adopt the similar loss function Lbbx and Lobj as YOLOv2 [30] but we balance the positive and negative by not computing some loss from negatives samples for the objectiveness scores. Thus, the overall detection loss function is Ldet = Lc + Lbbx + Lobj.

其中1(·,i)是当前锚框是否真正属于i类的指示函数。引入softmax后,特定锚定的分类分数总和等于1,不太可能的类别预测将被抑制。在下面的实验中,该softmax损耗将优于二进制损耗。对于包围盒回归和客观回归,我们采用与YOLOv2[30]相似的损失函数Lbbx和Lobj,但我们通过不计算客观分数的负样本损失来平衡正负。因此,整体检测损失函数是Ldet=Lc+Lbbx+Lobj。

Reweighting module input. The input of the reweighting module should be the object of interest. However, in object detection task, one image may contain multiple objects from different classes. To let the reweighting module know what the target class is, in additional to three RGB channels, we include an additional “mask” channel (Mi) that has only binary values: on the position within the bounding box of an object of interest, the value is 1, otherwise it is 0 (see left-bottom of Fig. 2). If multiple target objects are present on the image, only one object is used. This additional mask channel gives the reweighting module the knowledge of what part of the image’s information it should use, and what part should be considered as “background”. Combining mask and image as input not only provides class information of the object of interest but also the location information (indicated by the mask) useful for detection. In the experiments, we also investigate other input forms.

重加权模块输入   重新加权模块的输入应为感兴趣的对象。然而,在目标检测任务中,一幅图像可能包含来自不同类别的多个对象。为了让重加权模块知道目标类是什么,除了三个RGB通道外,我们还包括一个仅具有二进制值的附加“掩码”通道(Mi):在感兴趣对象的边界框内的位置上,值为1,否则为0(参见图2左下)。如果图像上存在多个目标对象,则仅使用一个对象。这个额外的遮罩通道为重加权模块提供了它应该使用的图像信息的哪一部分的知识,以及应该将哪一部分视为“背景”。结合掩码和图像作为输入不仅提供感兴趣对象的类别信息,而且还提供用于检测的位置信息(由掩码指示)。在实验中,我们还研究了其他输入形式。

4  实验

In this section, we evaluate our model and compare it with various baselines, to show our model can learn to detect novel objects significantly faster and more accurately. We use YOLOv2 [30] as the base detector. Due to space limit, we defer all the model architecture and implementation details to the supplementary material. The code to reproduce the results will be released at https:// github.com/bingykang/Fewshot_Detection.

在本节中,我们评估了我们的模型,并将其与各种基线进行比较,以表明我们的模型可以学习更快、更准确地检测新对象。我们使用YOLOv2[30]作为基本检测器。由于篇幅限制,我们将所有模型架构和实现细节推迟到补充材料中。复制结果的代码将在https://github.com/bingykang/Fewshot_Detection上发布。

4.1. Experimental Setup 实验设置

Datasets.  We evaluate our model for few-shot detection on the widely-used object detection benchmarks, i.e., VOC 2007 [11], VOC 2012 [10], and MS-COCO [21]. We follow the common practice [30, 32, 34, 6] and use VOC 07 test set for testing while use VOC 07 and 12 train/val sets for training. Out of its 20 object categories, we randomly select 5 classes as the novel ones, while keep the remaining 15 ones as the base. We evaluate with 3 different base/novel splits. During base training, only annotations of the base classes are given. For few-shot fine-tuning, we use a very small set of training images to ensure that each class of objects only has k annotated bounding boxes, where k equals 1, 2, 3, 5 and 10. Similarly, on the MS-COCO dataset, we use 5000 images from the validation set for evaluation, and the rest images in train/val sets for training. Out of its 80 object classes, we select 20 classes overlapped with VOC as novel classes, and the remaining 60 classes as the base classes. We also consider learning the model on the 60 base classes from COCO and applying it to detect the 20 novel objects in PASCAL VOC. This setting features a cross-dataset learning problem that we denote as COCO to PASCAL.

数据集    我们根据广泛使用的目标检测基准,即VOC 2007[11]、VOC 2012[10]和MS-COCO[21],评估了我们的小样本检测模型。我们遵循常见做法[30,32,34,6],使用VOC 07测试集进行测试,同时使用VOC 07和12个训练集/val进行训练。在其20个对象类别中,我们随机选择5个类别作为新类别,而保留其余15个类别作为基础。我们使用3种不同的基/新的样本进行评估。在基础训练期间,只给出基类的标注。对于小样本微调,我们使用一组非常小的训练图像来确保每类对象只有k个带注释的边界框,其中k等于1、2、3、5和10。类似地,在MS-COCO数据集上,我们使用来自验证集的5000张图像进行评估,其余图像在train/val集中进行训练。在其80个对象类中,我们选择20个与VOC重叠的类作为新类,其余60个类作为基类。我们还考虑从COCO 60个基类上学习模型,并应用它来检测PASCAL VOC中的20个新对象。此设置的特点是跨数据集学习问题,我们将其表示为COCO to PASCAL。

Note the testing images may contain distracting base classes (which are not targeted classes to detect) and some images do not contain objects of the targeted novel class. This makes the few-shot detection further challenging.

注意:测试图像可能包含分散注意力的基类(不是要检测的目标类),并且一些图像不包含目标类的对象。这使得小样本检测更具挑战性。

Baselines.   We compare our model with five competitive baselines. Three of them are built upon the vanilla YOLOv2 detector with straightforward few-shot learning strategies. The first one is to train the detector on images from the base and novel classes together. In this way, it can learn good features from the base classes that are applicable for detecting novel classes. We term this baseline as YOLO-joint. We train this baseline model with the same total iterations as ours. The  other two YOLO-based baselines also use two training phases as ours. In particular, they train the original YOLOv2 model with the same base training phase as ours; for the few-shot fine tuning phase, one fine-tunes the model with the same iterations as ours, giving the YOLO-ft baseline; and one trains the model to fully converge, giving YOLO-ft-full. Comparing with these baselines can help understand the few-shot learning advantage of our models brought by the proposed feature reweighting method. The last two baselines are from a recent few-shot detection method, i.e., Low-Shot Transfer Detector (LSTD) [4]. LSTD relies on background depression (BD) and transfer knowledge (TK) to obtain a few-shot detection model on the novel classes. For fair comparison, we re-implement BD and TK based on YOLOV2, train it for the same iterations as ours, obtaining LSTD(YOLO); and train it to convergence to obtain the last baseline, LSTD(YOLO)-full.

基线   我们将我们的模型与五个竞争基线进行比较。其中三个基于vanilla YOLOv2探测器,具有简单的小样本学习策略。第一种方法是将基类和新类的图像一起训练检测器。通过这种方式,它可以从适用于检测新类的基类中学习良好的特征。我们将此基线称为YOLO-joint。我们使用与我们相同的总迭代来训练这个基线模型。另外两个基于YOLO的基线也使用了我们的两个训练阶段。特别是,他们用与我们相同的基础训练阶段训练原始的YOLOv2模型;对于小样本微调阶段,一个微调模型的迭代次数与我们的相同,给出YOLO-ft基线;其中一个训练模型完全收敛,使YOLO ft完全收敛。与这些基线进行比较有助于理解我们提出的特征重加权方法所带来的模型的小样本学习优势。最后两条基线来自最近的几种小样本检测方法,即Low-Shot Transfer Detector(LSTD)[4]。LSTD依靠背景抑制(BD)和转移知识(TK)在新类上获得小样本检测模型。为了公平比较,我们在YOLOV2的基础上重新实现了BD和TK,对其进行与我们相同的迭代训练,获得LSTD(YOLO);并将其训练为收敛,以获得最后一条基线LSTD(YLO)-full。

4.2. Comparison with Baselines 与基线的比较

PASCAL VOC.   We present our main results on novel classes in Table 1. First we note that our model significantly outperforms the baselines, especially when the labels are extremely scarce (1-3 shot). The improvements are also consistent for different base/novel class splits and number of shots. In contrast, LSTD(YOLO) can boost performance in some cases, but might harm the detection in other cases. Take 5-shot detection as an example, LSTD(YOLO)- full brings 4.3 mAP improvement compared to YOLO-ftfull on novel set 1, but it is worse than YOLO-ft-full by 5.1 mAP on novel set 2. Second, we note that YOLO-ft/YOLOft-full also performs significantly better than YOLO-joint. This demonstrates the necessity of the two training phases employed in our model: it is better to first train a good knowledge representation on base classes and then fine-tune with few-shot data, otherwise joint training with let the detector bias towards base classes and learn nearly nothing about novel classes. More detailed results about each class is available at supplementary material.

PASCAL VOC.   我们在表1中给出了关于新类的主要结果。首先,我们注意到我们的模型明显优于基线,特别是当标签非常稀少时(1-3次)。对于不同的基本类/新类拆分和快照数,改进也是一致的。相反,LSTD(YOLO)在某些情况下可以提高性能,但在其他情况下可能会损害检测。以5次激发检测为例,LSTD(YLO)-full在新集合1上比YLO ftfull提高了4.3 mAP,但在新集合2上比YLO ft full差5.1 mAP。其次,我们注意到YOLO ft/YOLOft full的性能也明显优于YOLO关节。这证明了在我们的模型中采用两个训练阶段的必要性:最好先在基类上训练良好的知识表示,然后使用少量的快照数据进行微调,否则,联合训练会让检测器偏向基类,而对新类几乎一无所知。有关每门课程的更详细结果,请参阅补充资料。

【论文总结】Few-shot Object Detection via Feature Reweighting(附翻译)_第4张图片

COCO.   The results for COCO dataset is shown in Table 2. We evaluate for k = 10 and k = 30 shots per class. In both cases, our model outperforms all the baselines. In particular, when the YOLO baseline is trained with same iterations with our model, it achieves an AP of less than 1%. We also observe that there is much room to improve the results obtained in the few-shot scenario. This is possibly due to the complexity and large amount of data in COCO so that few-shot detection over it is quite challenging.

COCO  COCO数据集的结果如表2所示。我们对每类的k=10和k=30进行评估。在这两种情况下,我们的模型都优于所有基线。特别是,当使用我们的模型以相同的迭代来训练YOLO基线时,它实现的AP小于1%。我们还观察到,在小样本场景中获得的结果还有很大的改进空间。这可能是由于COCO中的复杂性和大量数据,因此在COCO中小样本检测是相当具有挑战性的。

【论文总结】Few-shot Object Detection via Feature Reweighting(附翻译)_第5张图片

COCO to PASCAL.   We evaluate our model using 10-shot image per class from PASCAL. The mAP of YOLOft, YOLO-ft-full, LSTD(YOLO), LSTD(YOLO)-full are 11.24%, 28.29%, 10.99% 28.95% respectively, while our method achieves 32.29%. The performance on PASCALnovel  classes is worse than that when we use base classes in PASCAL dataset (which has mAP around 40%). This might be explained by the different numbers of novel classes, i.e., 20 v.s. 5.

COCO到PASCAL   我们使用PASCAL提供的每类10张照片来评估我们的模型。YOLOft、YOLO ft full、LSTD(YLO)、LSTD(YLO)-full图分别为11.24%、28.29%、10.99%和28.95%,而我们的方法达到32.29%。在PASCALnovel类上的性能比在PASCAL数据集中使用基类时差(它的mAP约为40%)。这可能是因为新样本类别的数量不同,即20 v.s.5。

4.3. Performance Analysis 性能分析

Learning speed. Here we analyze learning speed of our models. The results show that despite the fact that our fewshot detection model does not consider adaptation speed explicitly in the optimization process, it still exhibits surprisingly fast adaptation ability. Note that in experiments of Table 1, YOLO-ft-full and LSTD(YOLO)-full requires 25,000 iterations for it to fully converge, while our model only require 1200 iterations to converge to a higher accuracy. When the baseline YOLO-ft and LSTD(YOLO) are trained for the same iterations as ours, their performance is far worse. In this section, we compare the full convergence behavior of YOLO-joint, YOLO-ft-full and our method in Fig. 3. The AP value are normalized by the maximum value during the training of our method and the baseline together. This experiment is conducted on PASCAL VOC base/novel split 1, with 10-shot bounding box labels on novel classes.

学习速度   这里我们分析了我们模型的学习速度。结果表明,尽管我们的FEWSKE检测模型没有在优化过程中明确地考虑适应速度,但它仍然表现出惊人的快速适应能力。注意,在表1的实验中,YOLO ft full和LSTD(YOLO)-full需要25000次迭代才能完全收敛,而我们的模型只需要1200次迭代才能收敛到更高的精度。当基线YOLO-ft和LSTD(YOLO)针对与我们相同的迭代进行训练时,它们的性能要差得多。在本节中,我们比较了YOLO-joint、YOLO ft full和图3中的方法的完全收敛行为。在我们的方法和基线训练期间,AP值被最大值归一化。本实验在PASCAL VOC base/novel split 1上进行,在类类上有10个样本边界框标签。

From Fig. 3, our method (solid lines) converges significantly faster than the baseline YOLO detector (dashed lines), for each novel class as well as on average. For the class Sofa (orange line), despite the baseline YOLO detector eventually slightly outperforms our method, it takes a great amount of training iterations to catch up with the latter. This behavior makes our model a good few-shot detector in practice, where scarcely labeled novel classes may come in any time and short adaptation time is desired to put the system in real usage fast. This also opens up our model’s potential in a life-long learning setting [5], where the model accumulates the knowledge learned from past and uses/adapts it for future prediction. We also observe similar convergence advantage of our model over YOLO-ft-full and LSTD(YOLO)-full.

从图3可以看出,我们的方法(实线)收敛速度明显快于基线YOLO检测器(虚线),适用于每种新类别以及平均值。对于类Sofa(橙色线),尽管基线YOLO检测器最终略优于我们的方法,但需要大量的训练迭代才能赶上后者。这种行为使我们的模型在实践中成为一个很好的小样本检测器,几乎没有标记的新类随时可能出现,并且需要很短的适应时间来快速将系统投入实际使用。这也打开了我们模型在终身学习环境中的潜力[5],在这种环境中,模型积累了从过去学到的知识,并将其用于未来预测。我们还观察到,我们的模型比YOLO ft full和LSTD(YOLO)-full具有类似的收敛优势。

【论文总结】Few-shot Object Detection via Feature Reweighting(附翻译)_第6张图片

Learned reweighting coefficients. The reweighting coefficient is important for the meta-feature usage and detection performance. To see this, we first plot the 1024-d reweighting vectors for each class in Fig. 4a. In the figure, each row corresponds to a class and each column corresponds to a feature. The features are ranked by variance among 20 classes from left to right. We observe that roughly half of the features (columns) have notable variance among different classes (multiple colors in a column), while the other half are insensitive to classes (roughly the same color in a column). This suggests that indeed only a portion of features are used differently when detecting different classes, while the remaining ones are shared across different classes.

学习重加权系数   重加权系数对于元特征的使用和检测性能非常重要。为了看到这一点,我们首先在图4a中绘制每个类别的1024-d重加权向量。在图中,每行对应一个类,每列对应一个特征。这些特征按从左到右的20个类别的方差排序。我们观察到,大约一半的特征(列)在不同的类(列中有多种颜色)之间存在显著差异,而另一半对类(列中大致相同的颜色)不敏感。这表明,当检测不同的类时,实际上只有一部分特征被不同地使用,而其余的特征则在不同的类之间共享。

We further visualize the reweighting vectors by tSNE [24] in Fig. 4b learned from 10 shots/class on base/novel split 1. In this figure, we plot the reweighting vector generated by each support input, along with their average for each class. We observe that not only vectors of the same classes tend to form clusters, the ones of visually similar classes also tend to be close. For instance, the classes Cow, Horse, Sheep, Cat and Dog are all around the rightbottom corner, and they are all animals. Classes of transportation tools are at the top of the figure. Person and Bird are more visually different from the mentioned animals, but are still closer to them than the transportation tools.

我们通过图4b中的tSNE[24]进一步可视化了重加权向量,该向量是从10 shots/class在基类/新类1中学习到的。在这个图中,我们绘制了由每个支持输入生成的重加权向量,以及每个类的平均值。我们观察到,不仅相同类的向量倾向于形成簇,视觉上相似类的向量也倾向于接近。例如,牛、马、羊、猫和狗都在右下角,它们都是动物。交通工具的种类在图的顶部。人和鸟在视觉上与上述动物有更多的不同,但比交通工具更接近它们。

【论文总结】Few-shot Object Detection via Feature Reweighting(附翻译)_第7张图片

Learned meta features.  Here we analyze the learned meta features from the base classes in the first training stage. Ideally, a desirable few-shot detection model should preferably perform as well when data are abundant. We compare the mAP on base classes for models obtained after the first-stage base training, between our model and the vanilla YOLO detector (used in latter two baselines). The results are shown in Table 3. Despite our detector is designed for a few-shot scenario, it also has strong representation power and offers good meta features to reach comparable performance with the original YOLOv2 detector trained on a lot of samples. This lays a basis for solving the few-shot object detection problem.

学习元特征    在这里,我们分析了在第一个训练阶段从基类学习到的元特征。理想情况下,当数据丰富时,理想的小样本检测模型最好也能执行。我们比较了在第一阶段基础训练后获得的模型的基础类地图,我们的模型和vanilla YOLO检测器(用于后两个基线)之间的地图。结果如表3所示。尽管我们的探测器是为小样本场景设计的,但它也具有强大的表示能力,并提供良好的元功能,以达到与在大量样本上训练的原始YOLOv2探测器相当的性能。这为解决小样本目标检测问题奠定了基础。

4.4. Ablation Studies 消融研究

We analyze the effects of various components in our system, by comparing the performance on both base classes and novel classes. The experiments are on PASCAL VOC base/novel split 1, using 10-shot data on novel classes.

通过比较基类和新类的性能,我们分析了系统中各个部分的影响。实验是在PASCAL VOC base/novel split 1上进行的,使用了10个新类的快照数据。

Which layer output features to reweight. In our experiments, we apply the reweighting module to moderate the output of the second last layer (layer 21). This is the highest level of intermediate features we could use. However, other options could be considered as well. We experiment with applying the reweighting vectors to feature maps output from layer 20 and 13, while also considering only half of features in layer 21. The results are shown in Table 4. We can see that the it is more suitable to implement feature reweighting.

要重新加权的图层输出特征   在我们的实验中,我们应用重加权模块来调节第二层(第21层)的输出。这是我们可以使用的最高级别的中间特性。不过,也可以考虑其他选择。我们尝试将重加权向量应用于从第20层和第13层输出的特征贴图,同时只考虑第21层中一半的特征。结果如表4所示。我们可以看到,它更适合于实现特征重新加权。

【论文总结】Few-shot Object Detection via Feature Reweighting(附翻译)_第8张图片

At deeper layers, as using earlier layers gives worse performance. Moreover, moderating only half of the features does not hurt the performance much, which demonstrates that a significant portion of features can be shared among classes, as we analyzed in Sec. 4.3.

在较深的层,因为使用较早的层会产生较差的性能。此外,仅调节一半的特征并不会对性能造成太大的影响,这表明,正如我们在第4.3节中分析的那样,大部分特性可以在类之间共享。

Loss functions. As we mentioned in Sec. 3.2, there are several options for defining the classification loss. Among them the binary loss is the most straightforward one: if the inputs to the reweighting module and the detector are from the same class, the model predicts 1 and otherwise 0. This binary loss can be defined in following two ways. The single-binary loss refers to that in each iteration the reweighting module only takes one class of input, and the detector regresses 0 or 1; and the multi-binary loss refers to that per iteration the reweighting module takes N examples from N classes, and compute N binary loss in total. Prior works on Siamese Network [18] and Learnet [2] use the single-binary loss. Instead, our model uses the softmax loss for calibrating the classification scores of N classes. To investigate the effects of using different loss functions, we compare model performance trained with the single-binary, multi-binary loss and with our softmax loss in Table 5. We observe that using softmax loss significantly outperforms binary loss. This is likely due to its effect in suppressing redundant detection results.

损失函数    正如我们在3.2节中提到的,定义分类损失有几个选项。其中binary loss是最直接的:如果重加权模块和检测器的输入来自同一类,则模型预测为1,否则为0。这种二进制损失可以用以下两种方式定义。single-binary loss是指在每次迭代中,重加权模块只接受一类输入,检测器回归0或1;multi-binary是指每次迭代,重加权模块从N个类中抽取N个实例,计算出总共N个二进制损失。先前关于Siamese 网络[18]和Learnet[2]的工作使用了single-binary loss。相反,我们的模型使用softmax损失来校准N个类别的分类分数。为了研究使用不同损耗函数的效果,我们比较了使用single-binary 、multi-binary和表5中的softmax损耗训练的模型性能。我们观察到,使用softmax损耗显著优于二进制损耗。这可能是由于其抑制冗余检测结果的效果。

Input form of reweighting module. In our experiments, we use an image of the target class with a binary mask channel indicating position of the object as input to the metamodel. We examine the case where we only feed the image. From Table 6 we see that this gives lower performance especially on novel classes. An apparently reasonable alternative is to feed the cropped target object together with the image. From Table 6, this solution is also slightly worse. The necessity of the mask may lie in that it provides the precise information about the object location and its context.

重加权模块的输入模型   在我们的实验中,我们使用目标类的图像和指示对象位置的二进制掩码通道作为元模型的输入。我们检查只提供图像的情况。从表6中我们可以看出,这会降低性能,尤其是在新类上。一个明显合理的替代方案是将裁剪的目标对象与图像一起馈送。从表6可以看出,这种解决方案也稍微差一些。掩码的必要性可能在于它提供了有关对象位置及其上下文的精确信息。【论文总结】Few-shot Object Detection via Feature Reweighting(附翻译)_第9张图片

We also analyze the input sampling scheme for testing and effect of sharing weights between feature extractor and reweighting module. See supplementary material. 

我们还分析了用于测试的输入采样方案以及特征提取和重加权模块之间共享权重的效果。见补充材料。

5   结论

This work is among the first to explore the practical and challenging few-shot detection problems. It introduced a new model to learn to fast adjust contributions of the basic features to detect novel classes with a few example. Experiments on realistic benchmark datasets clearly demonstrate its effectiveness. This work also compared the model learning speed, analyzed predicted reweighting vectors and contributions of each design component, providing in-depth understanding of the proposed model. Few-shot detection is a challenging problem and we will further explore how to improve its performance for more complex scenes.

这项工作是第一次探索实际和具有挑战性的小样本检测问题。它引入了一个新的模型来学习快速调整基本特征的贡献,以检测新的类,并给出了一些示例。在真实基准数据集上的实验清楚地证明了它的有效性。这项工作还比较了模型学习速度,分析了预测的重新加权向量和每个设计组件的贡献,提供了对所提出模型的深入理解。小样本检测是一个具有挑战性的问题,我们将进一步探讨如何在更复杂的场景中提高其性能。

你可能感兴趣的:(论文,目标检测,人工智能,计算机视觉)