ECCV 2018 迁移学习相关论文

ECCV 2018 papers

Zero-Shot Object Detection

We introduce and tackle the problem of zero-shot object detection (ZSD), which aims to detect object classes which are not observed during training. We work with a challenging set of object classes, not restricting ourselves to similar and/or fine-grained categories as in prior works on zero-shot classification. We present a principled approach by first adapting visual-semantic embeddings for ZSD. We then discuss the problems associated with selecting a background class and motivate two background-aware approaches for learning robust detectors. One of these models uses a fixed background class and the other is based on iterative latent assignments. We also outline the challenge associated with using a limited number of training classes and propose a solution based on dense sampling of the semantic label space using auxiliary data with a large number of categories. We propose novel splits of two standard detection datasets – MSCOCO and VisualGenome, and present extensive empirical results in both the traditional and generalized zero-shot settings to highlight the benefits of the proposed methods. We provide useful insights into the algorithm and conclude by posing some open questions to encourage further research.

http://openaccess.thecvf.com/content_ECCV_2018/html/Ankan_Bansal_Zero-Shot_Object_Detection_ECCV_2018_paper.html
零样本目标检测,学习鲁棒的检测器

Selective Zero-Shot Classification with Augmented Attributes

In this paper, we introduce a selective zero-shot classification problem: how can the classifier avoid making dubious predictions? Existing attribute-based zero-shot classification methods are shown to work poorly in the selective classification scenario. We argue the under-complete human defined attribute vocabulary accounts for the poor performance. We propose a selective zero-shot classifier based on both the human defined and the automatically discovered residual attributes. The proposed classifier is constructed by firstly learning the defined and the residual attributes jointly. Then the predictions are conducted within the subspace of the defined attributes. Finally, the prediction confidence is measured by both the defined and the residual attributes. Experiments conducted on several benchmarks demonstrate that our classifier produces a superior performance to other methods under the risk-coverage trade-off metric.

http://openaccess.thecvf.com/content_ECCV_2018/html/Jie_Song_Selective_Zero-Shot_Classification_ECCV_2018_paper.html

零样本分类中的选择性分类场景
人类定义属性+自动发现的冗余属性
奉献覆盖 权衡度量

Zero-Shot Deep Domain Adaptation

Domain adaptation is an important tool to transfer knowledge about a task (e.g. classification) learned in a source domain to a second, or target domain. Current approaches assume that task-relevant target-domain data is available during training. We demonstrate how to perform domain adaptation when no such task-relevant target-domain data is available. To tackle this issue, we propose zero-shot deep domain adaptation (ZDDA), which uses privileged information from task-irrelevant dual-domain pairs. ZDDA learns a source-domain representation which is not only tailored for the task of interest but also close to the target-domain representation. Therefore, the source-domain task of interest solution (e.g. a classifier for classification tasks) which is jointly trained with the source-domain representation can be applicable to both the source and target representations. Using the MNIST, Fashion-MNIST, NIST, EMNIST, and SUN RGB-D datasets, we show that ZDDA can perform domain adaptation in classification tasks without access to task-relevant target-domain training data. We also extend ZDDA to perform sensor fusion in the SUN RGB-D scene classification task by simulating task-relevant target-domain representations with task-relevant source-domain data. To the best of our knowledge, ZDDA is the first domain adaptation and sensor fusion method which requires no task-relevant target-domain data. The underlying principle is not particular to computer vision data, but should be extensible to other domains.

http://openaccess.thecvf.com/content_ECCV_2018/html/Kuan-Chuan_Peng_Zero-Shot_Deep_Domain_ECCV_2018_paper.html
零样本深度域适配
场景:没有任务相关的目标域数据情况下,如何实现域适配
使用来自任务无关的双域对的特权信息

Learning Class Prototypes via Structure Alignment for Zero-Shot Recognition

Zero-shot learning (ZSL) aims to recognize objects of novel classes without any training samples of specific classes, which is achieved by exploiting the semantic information and auxiliary datasets. Recently most ZSL approaches focus on learning visual-semantic embeddings to transfer knowledge from the auxiliary datasets to the novel classes. However, few works study whether the semantic information is discriminative or not for the recognition task. To tackle such problem, we propose a coupled dictionary learning approach to align the visual-semantic structures using the class prototypes, where the discriminative information lying in the visual space is utilized to improve the less discriminative semantic space. Then, zero-shot recognition can be performed in different spaces by the simple nearest neighbor approach using the learned class prototypes. Extensive experiments on four benchmark datasets show the effectiveness of the proposed approach.

http://openaccess.thecvf.com/content_ECCV_2018/html/Huajie_Jiang_Learning_Class_Prototypes_ECCV_2018_paper.html
动机:没人研究语义信息是否对识别任务有判别力。
耦合字典学习方法,使用类原型对齐视觉语义结构

A Zero-Shot Framework for Sketch based Image Retrieval

Sketch-based image retrieval (SBIR) is the task of retrieving images from a natural image database that correspond to a given hand-drawn sketch. Ideally, an SBIR model should learn to associate components in the sketch (say, feet, tail, etc.) with the corresponding components in the image. However, current evaluation methods simply focus only on coarse-grained evaluation where the focus is on retrieving images which belong to the same class as the sketch but not necessarily having the same components as in the sketch. As a result, existing methods simply learn to associate sketches with classes seen during training and hence fail to generalize to unseen classes. In this paper, we propose a new bench mark for zero-shot SBIR where the model is evaluated on novel classes that are not seen during training. We show through extensive experiments that existing models for SBIR which are trained in a discriminative setting learn only class specific mappings and fail to generalize to the proposed zero-shot setting. To circumvent this, we propose a generative approach for the SBIR task by proposing deep conditional generative models which take the sketch as an input and fill the missing information stochastically. Experiments on this new benchmark created from the “Sketchy” dataset, which is a large-scale database of sketch-photo pairs demonstrate that the performance of these generative models is significantly better than several state-of-the-art approaches in the proposed zero-shot framework of the coarse-grained SBIR task.
http://openaccess.thecvf.com/content_ECCV_2018/html/Sasikiran_Yelamarthi_A_Zero-Shot_Framework_ECCV_2018_paper.html
基于草图的图像检索

Multi-modal Cycle-consistent Generalized Zero-Shot Learning

In generalized zero shot learning (GZSL), the set of classes are split into seen and unseen classes, where training relies on the semantic features of the seen and unseen classes and the visual representations of only the seen classes, while testing uses the visual representations of the seen and unseen classes. Current methods address GZSL by learning a transformation from the visual to the semantic space, exploring the assumption that the distribution of classes in the semantic and visual spaces is relatively similar. Such methods tend to transform unseen testing visual representations into one of the seen classes’ semantic features instead of the semantic features of the correct unseen class, resulting in low accuracy GZSL classification. Recently, generative adversarial networks (GAN) have been explored to synthesize visual representations of the unseen classes from their semantic features - the synthesized representations of the seen and unseen classes are then used to train the GZSL classifier. This approach has been shown to boost GZSL classification accuracy, but there is one important missing constraint: there is no guarantee that synthetic visual representations can generate back their semantic feature in a multi-modal cycle-consistent manner. This missing constraint can result in synthetic visual representations that do not represent well their semantic features, which means that the use of this constraint can improve GAN-based approaches. In this paper, we propose the use of such constraint based on a new regularization for the GAN training that forces the generated visual features to reconstruct their original semantic features. Once our model is trained with this multi-modal cycle-consistent semantically, we can then synthesize more representative visual features for the seen and, more importantly, for the unseen classes. Our proposed approach shows the best GZSL classification results in the field in several publicly available datasets.
http://openaccess.thecvf.com/content_ECCV_2018/html/RAFAEL_FELIX_Multi-modal_Cycle-consistent_Generalized_ECCV_2018_paper.html
多模态循环一致广义零样本学习
动机:转换方法,将未见类测试视觉表示转换到某个已见类的语义特征,而不是未见类的语义特征
在GAN方法中增加一个约束:合成视觉表示能够生成属于自己的语义特征,以多模态循环一致的方式。

Dynamic Conditional Networks for Few-Shot Learning

This paper proposes a novel Dynamic Conditional Convolutional Network (DCCN) to handle conditional few-shot learning, i.e, only a few training samples are available for each condition. DCCN consists of dual subnets: DyConvNet contains a dynamic convolutional layer with a bank of basis filters; CondiNet predicts a set of adaptive weights from conditional inputs to linearly combine the basis filters. In this manner, a specific convolutional kernel can be dynamically obtained for each conditional input. The filter bank is shared between all conditions thus only a low-dimension weight vector needs to be learned. This significantly facilitates the parameter learning across different conditions when training data are limited. We evaluate DCCN on four tasks which can be formulated as conditional model learning, including specific object counting, multi-modal image classification, phrase grounding and identity based face generation. Extensive experiments demonstrate the superiority of the proposed model in the conditional few-shot learning setting.
http://openaccess.thecvf.com/content_ECCV_2018/html/Fang_Zhao_Dynamic_Conditional_Networks_ECCV_2018_paper.html
条件小样本

你可能感兴趣的:(paper)