Taskonomy: Disentangling Task Transfer Learning翻译[下]

Taskonomy: Disentangling Task Transfer Learning翻译 上

Taskonomy: Disentangling Task Transfer Learning

任务:解构任务转移学习

Abstract

摘要

Do visual tasks have a relationship, or are they unrelated? For instance, could having surface normals simplify estimating the depth of an image? Intuition answers these questions positively, implying existence of a structure among visual tasks. Knowing this structure has notable values; it is the concept underlying transfer learning and provides a principled way for identifying redundancies across tasks, e.g., to seamlessly reuse supervision among related tasks or solve many tasks in one system without piling up the complexity.

视觉任务是否有关系,或者它们是不相关的?例如,可以通过表面法线来简化估计图像的深度?直觉正面回答这些问题,暗示在视觉任务中存在结构。了解这种结构具有显着的价值;它是转移学习的基础概念,并提供了识别任务间冗余的原则性方法,例如,无缝地重复使用相关任务之间的监督或者解决一个系统中的许多任务而不会增加复杂性。

We propose a fully computational approach for modeling the structure of the space of visual tasks. This is done via finding (first and higher-order) transfer learning dependencies across a dictionary of twenty-six 2D, 2.5D, 3D, and semantic tasks in a latent space. The product is a computational taxonomic map for task transfer learning. We study the consequences of this structure, e.g. nontrivial emerged relationships, and exploit them to reduce the demand for labeled data. For example, we show that the total number of labeled datapoints needed for solving a set of 10 tasks can be reduced by roughly 23 (compared to training independently) while keeping the performance nearly the same. We provide a set of tools for computing and probing this taxonomical structure including a solver that users can employ to devise efficient supervision policies for their use cases.

我们提出了一个完全计算的方法来建模视觉任务空间的结构。这是通过在潜在空间中发现二十六个2D,2.5D,3D和语义任务的字典(第一阶和更高阶)传输学习依赖关系完成的。该产品是用于任务转移学习的计算分类图。我们研究这种结构的后果,例如非平凡的涌现关系,并利用它们来减少对标记数据的需求。例如,我们表明,解决一组10个任务所需的标记数据点的总数可以减少大约23(与独立训练相比),同时保持性能几乎相同。我们提供了一套计算和探测这个分类结构的工具,包括一个求解器,用户可以利用这个求解器为他们的用例制定有效的监督策略。

1. Introduction

1.介绍

Object recognition, depth estimation, edge detection, pose estimation, etc are examples of common vision tasks deemed useful and tackled by the research community. Some of them have rather clear relationships: we understand that surface normals and depth are related (one is a derivate of the other), or vanishing points in a room are useful for orientation. Other relationships are less clear: how keypoint detection and the shading in a room can, together, perform pose estimation.

对象识别,深度估计,边缘检测,姿态估计等都是研究界认为有用和处理的常见视觉任务的例子。其中一些人的关系相当清晰:我们知道表面法线和深度是相关的(一个是另一个的衍生物),或者房间中的消失点对定位有用。其他关系不太清楚:关键点检测和房间中的阴影如何一起执行姿态估计。

image
image

Figure 1: A sample task structure discovered by the computational task taxonomy (taskonomy). It found that, for instance, by combining the learned features of a surface normal estimator and occlusion edge detector, good networks for reshading and point matching can be rapidly trained with little labeled data.

图1:计算任务分类(taskonomy)发现的示例任务结构。例如,通过将表面法线估计器和遮挡边缘检测器的学习特征相结合,发现用于重建和点匹配的良好网络可以用少量标记数据进行快速训练。

The field of computer vision has indeed gone far without explicitly using these relationships. We have made remarkable progress by developing advanced learning machinery (e.g. ConvNets) capable of finding complex mappings from X to Y when many pairs of
image

s.t.
image
are given as training data. This is usually referred to as fully supervised learning and often leads to problems being solved in isolation. Siloing tasks makes training a new task or a comprehensive perception system a Sisyphean challenge, whereby each task needs to be learned individually from scratch. Doing so ignores their quantifiably useful relationships leading to a massive labeled data requirement.

在没有明确使用这些关系的情况下,计算机视觉领域确实走得很远。通过开发先进的学习机器(例如ConvNets),我们已经取得了令人瞩目的进步,能够将复杂映射从X变为Y,当多对
image

s.t.
image
作为训练数据给出。这通常被称为完全监督学习,通常会导致孤立地解决问题。筒仓任务使得训练成为一项新任务或综合感知系统成为Sisyphean挑战,因此每项任务都需要从头开始单独学习。这样做会忽略它们的量化的有用关系,从而导致大量标注的数据需求。

Alternatively, a model aware of the relationships among tasks demands less supervision, uses less computation, and behaves in more predictable ways. Incorporating such a structure is the first stepping stone towards develop ing provably efficient comprehensive/universal perception models [34, 4], i.e. ones that can solve a large set of tasks before becoming intractable in supervision or computation demands. However, this task space structure and its effects are still largely unknown. The relationships are non-trivial, and finding them is complicated by the fact that we have imperfect learning models and optimizers. In this paper, we attempt to shed light on this underlying structure and present a framework for mapping the space of visual tasks. Here what we mean by “structure” is a collection of computationally found relations specifying which tasks supply useful information to another, and by how much (see Fig. 1). We employ a fully computational approach for this purpose, with neural networks as the adopted computational function class. In a feedforward network, each layer successively forms more abstract representations of the input containing the information needed for mapping the input to the output. These representations, however, can transmit statistics useful for solving other outputs (tasks), presumably if the tasks are related in some form [83, 19, 58, 46]. This is the basis of our approach: we computes an affinity matrix among tasks based on whether the solution for one task can be sufficiently easily read out of the representation trained for another task. Such transfers are exhaustively sampled, and a Binary Integer Programming formulation extracts a globally efficient transfer policy from them. We show this model leads to solving tasks with far less data than learning them independently and the resulting structure holds on common datasets (ImageNet [78] and Places [104]).

另外,知道任务之间关系的模型需要更少的监督,使用更少的计算,并以更可预测的方式运行。结合这样的结构是开发可证明有效的综合/通用感知模型的第一个垫脚石[34,4],即在监督或计算需求变得棘手之前可以解决大量任务的那些模型[34,4]。然而,这个任务空间结构及其影响仍然大部分未知。这些关系是非平凡的,发现它们很复杂,因为我们有不完善的学习模型和优化器。在本文中,我们试图揭示这个底层结构,并提出一个映射视觉任务空间的框架。这里我们所说的“结构”是指计算找到的关系集合,指定哪些任务向另一个任务提供有用的信息,以及提供多少(见图1)。我们采用完全计算的方法来实现这一目的,神经网络作为采用的计算功能类。在前馈网络中,每个层次都连续形成包含将输入映射到输出所需信息的输入的更多抽象表示。然而,这些表述可以传输对解决其他输出(任务)有用的统计数据,假设这些任务是以某种形式相关的[83,19,58,46]。这是我们方法的基础:我们根据一项任务的解决方案是否可以充分轻松地从为另一项任务而训练的表示中读出来计算任务之间的亲和度矩阵。这种转移是抽象的,二进制整数规划公式从中提取全球有效的转移策略。我们展示这个模型导致解决任务的数据远远少于独立学习它们的结果,并且所得到的结构在普通数据集上保留(ImageNet [78]和Places [104])。

Being fully computational and representation-based, the proposed approach avoids imposing prior (possibly incorrect) assumptions on the task space. This is crucial because the priors about task relations are often derived from either human intuition or analytical knowledge, while neural networks need not operate on the same principles [63, 33, 40, 45, 102, 88]. For instance, although we might expect depth to transfer to surface normals better (derivatives are easy), the opposite is found to be the better direction in a computational framework (i.e. suited neural networks better).

所提出的方法完全基于计算和表示为基础,避免了在任务空间上施加先验(可能是错误的)假设。这是至关重要的,因为关于任务关系的先验往往源于人的直觉或分析知识,而神经网络不需要按照相同的原则运作[63,33,40,45,102,88]。例如,虽然我们可以预期深度能够更好地转移到曲面法线(衍生物很容易),但发现相反是计算框架中更好的方向(即更好地适应神经网络)。

An interactive taxonomy solver which uses our model to suggest data-efficient curricula, a live demo, dataset, and code are available at http://taskonomy.vision/.

http://taskonomy.vision/提供了一个使用我们的模型来建议数据有效课程,现场演示,数据集和代码的交互式分类解析器。

2. Related Work

2.相关工作

Assertions of existence of a structure among tasks date back to the early years of modern computer science, e.g. with Turing arguing for using learning elements [95, 98] rather than the final outcome or Jean Piaget’s works on developmental stages using previously learned stages as sources [74, 39, 38], and have extended to recent works [76, 73, 50, 18, 97, 61, 11, 66]. Here we make an attempt to actually find this structure. We acknowledge that this is related to a breadth of topics, e.g. compositional modeling [35, 10, 13, 23, 55, 92, 90], homomorphic cryptography [42], lifelong learning [93, 15, 85, 84], functional maps [71], certain aspects of Bayesian inference and Dirichlet processes [54, 91, 90, 89, 37, 39], few-shot learning [81, 25, 24, 70, 86], transfer learning [75, 84, 29, 64, 67, 59], un/semi/selfsupervised learning [22, 8, 17, 103, 19, 83], which are studied across various fields [73, 94, 12]. We review the topics most pertinent to vision within the constraints of space: Self-supervised learning methods leverage the inherent relationships between tasks to learn a desired expensive one (e.g. object detection) via a cheap surrogate (e.g. colorization) [68, 72, 17, 103, 100, 69]. Specifically, they use a manually-entered local part of the structure in the task space (as the surrogate task is manually defined). In contrast, our approach models this large space of tasks in a computational manner and can discover obscure relationships.

在任务中存在结构的断言可追溯到现代计算机科学的早期年代,例如,图灵主张使用学习元素[95,98]而不是最终结果或让皮亚杰在发育阶段使用以前学过的阶段作为来源的作品[74,39,38],并延伸到最近的作品[76,73,50 ,18,97,61,11,66]。这里我们试图找到这个结构。我们承认这与广泛的主题有关,例如组合建模[35,10,13,23,55,92,90],同态密码学[42],终身学习[93,15,85,84],功能图[71],贝叶斯推理和狄利克雷过程的某些方面[54,91,90,89,37,39],少数学习[81,25,24,70,86],转移学习[75,84,29,64,67,59],非/半/自我监督学习[22,8,17,103,19,83],它们在各个领域进行了研究[73,94,12]。我们在空间约束下回顾与视觉最相关的主题:自我监督学习方法利用任务之间固有的关系,通过廉价代理(例如着色)学习所需的昂贵代码(例如对象检测)[68,72,17 ,103,100,69]。具体而言,它们在任务空间中使用结构的手动输入局部部分(因为代理任务是手动定义的)。相比之下,我们的方法以计算方式模拟这个庞大的任务空间,并且可以发现模糊的关系。

Unsupervised learning is concerned with the redundancies in the input domain and leveraging them for forming compact representations, which are usually agnostic to the downstream task [8, 49, 20, 9, 32, 77]. Our approach is not unsupervised by definition as it is not agnostic to the tasks. Instead, it models the space tasks belong to and in a way utilizes the functional redundancies among tasks.

无监督学习关注输入领域的冗余,并利用它们形成紧凑的表示形式,这通常对下游任务是不可知的[8,49,20,9,32,77]。我们的方法并非由定义无监督,因为它不是不可知的任务。相反,它模拟任务所属的空间任务,并以某种方式利用任务间的功能冗余。

Meta-learning generally seeks performing the learning at a level higher than where conventional learning occurs, e.g. as employed in reinforcement learning [21, 31, 28], optimization [2, 82, 48], or certain architectural mechanisms [27, 30, 87, 65]. The motivation behind meta learning has similarities to ours and our outcome can be seen as a computational meta-structure of the space of tasks.

元学习通常寻求在比传统学习更高的层次上进行学习,例如,如强化学习[21,31,28],优化[2,82,48]或某些架构机制[27,30,87,65]中所采用的。元学习背后的动机与我们相似,我们的结果可以被看作是任务空间的计算元结构。

Multi-task learning targets developing systems that can provide multiple outputs for an input in one run [50, 18]. Multi-task learning has experienced recent progress and the reported advantages are another support for existence of a useful structure among tasks [93, 100, 50, 76, 73, 50, 18, 97, 61, 11, 66]. Unlike multi-task learning, we explicitly model the relations among tasks and extract a meta-structure. The large number of tasks we consider also makes developing one multi-task network for all infeasible.

多任务学习目标开发系统,可以在一次运行中为输入提供多个输出[50,18]。多任务学习经历了最近的进展,并且所报道的优势是在任务之间存在有用结构的另一个支持[93,100,50,76,73,50,18,97,61,11,66]。与多任务学习不同,我们明确地建模任务之间的关系并提取元结构。我们考虑的大量任务也使得为所有不可行的开发一个多任务网络。

Domain adaption seeks to render a function that is developed on a certain domain applicable to another [44, 99, 5, 80, 52, 26, 36]. It often addresses a shift in the input domain, e.g. webcam images to D-SLR [47], while the task is kept the same. In contrast, our framework is concerned with output (task) space, hence can be viewed as task/output adaptation. We also perform the adaptation in a larger space among many elements, rather than two or a few.

域自适应试图呈现一个在某个域上开发的函数,该函数适用于另一个[44,99,5,80,52,26,36]。它经常处理输入域的变化,例如摄像头图像到D-SLR [47],而任务保持不变。相比之下,我们的框架关注输出(任务)空间,因此可以视为任务/输出适配。我们也在很多元素的较大空间中执行适应,而不是两个或几个。

In the context of our approach to modeling transfer learning across tasks: Learning Theoretic approaches may overlap with any of the above topics and usually focus on providing generalization guarantees. They vary in their approach: e.g. by modeling transferability with the transfer family required to map a hypothesis for one task onto a hypothesis for another [7], through information-based approaches [60], or through modeling inductive bias [6]. For these guarantees, learning theoretic approaches usually rely on intractable computations, or avoid such computations by restricting the model or task. Our method draws inspiration from theoretical approaches but eschews (for now) theoretical guarantees in order to use modern neural machinery.

在我们的跨越任务的转移学习建模方法的背景下:学习理论方法可能会与上述任何主题重叠,并且通常侧重于提供泛化保证。它们的方法各不相同:例如通过建模转移家族的可转移性,以将一项任务的假设映射到另一项假设上[7],通过基于信息的方法[60]或通过建模归纳偏置[6]。对于这些保证,学习理论方法通常依赖于难处理的计算,或者通过限制模型或任务来避免这种计算。我们的方法从理论方法中汲取灵感,但是为了使用现代神经机器而避开(现在)理论保证。

image

Figure 2: Computational modeling of task relations and creating the taxonomy. From left to right: I. Train task-specific networks. II. Train (first order and higher) transfer functions among tasks in a latent space. III. Get normalized transfer affinities using AHP (Analytic Hierarchy Process). IV. Find global transfer taxonomy using BIP (Binary Integer Program).

图2:任务关系的计算建模和创建分类。从左到右:I.训练任务特定的网络。 II。在一个潜在空间中训练(一阶或更高阶段)的任务。 III。使用层次分析法(层次分析法)获得标准化的转移支付。 IV。使用BIP(二进制整数程序)查找全局转移分类。

3. Method

3.方法

We define the problem as follows: we want to maximize the collective performance on a set of tasks
image

image
, subject to the constraint that we have a limited supervision budget γ (due to financial, computational, or time constraints). We define our supervision budget γ to be the maximum allowable number of tasks that we are willing to train from scratch (i.e. source tasks). The task dictionary is defined as V=
image
where T is the set of tasks which we want solved (target), and S is the set of tasks that can be trained (source). Therefore,
image

are the tasks that we want solved but cannot train (“target-only”),
image
are the tasks that we want solved but could play as source too, and
image
are the “source-only” tasks which we may not directly care about to solve (e.g. jigsaw puzzle) but can be optionally used if they increase the performance on T. The task taxonomy (taskonomy) is a computationally found directed hypergraph that captures the notion of task transferability over any given task dictionary. An edge between a group of source tasks and a target task represents a feasible transfer case and its weight is the prediction of its performance. We use these edges to estimate the globally optimal transfer policy to solve T. Taxonomy produces a family of such graphs, parameterized by the available supervision budget, chosen tasks, transfer orders, and transfer functions’ expressiveness.

我们将问题定义如下:我们希望在一组任务
image

image
中最大化集体绩效,受制于我们有限的监督预算γ(由于财务,计算或时间限制)。我们将我们的监督预算γ定义为我们愿意从头开始训练的最大可允许任务数量(即源任务)。任务字典定义为V =
image
,其中T是我们想要解决的任务集合(目标),S是可以训练的任务集合(源)。因此,
image

是我们想要解决但无法训练的任务(“仅限目标”),
image
是我们想要解决但可以作为源的任务,
image
是我们可能不需要的“源代码”任务直接关心解决问题(例如拼图游戏),但可以选择使用,如果它们提高了T上的性能。任务分类法(任务分类法)是一种计算发现的定向超图,它捕获任何给定任务字典上任务可转移性的概念。一组源任务和一个目标任务之间的边代表一个可行的分配案例,其权重是其性能的预测。我们使用这些边缘来估计全局最优传输策略来解决T.分类学生成了一系列这样的图表,由可用的监督预算,选择的任务,转移顺序和转移函数的表达参数化。

Taxonomy is built using a four step process depicted in Fig. 2. In stage I, a task-specific network for each task in S is trained. In stage II, all feasible transfers between sources and targets are trained. We include higher-order transfers which use multiple inputs task to transfer to one target. In stage III, the task affinities acquired from transfer function performances are normalized, and in stage IV, we synthesize a hypergraph which can predict the performance of any transfer policy and optimize for the optimal one.

分类标准是使用图2所示的四步过程构建的。在第一阶段,针对S中每个任务的任务特定网络进行训练。在第二阶段,培训来源和目标之间的所有可行转移。我们包括使用多个输入任务转移到一个目标的高阶转移。在阶段III中,从传递函数性能获得的任务亲和度被归一化,并且在阶段IV中,我们合成超图,其可以预测任何传递策略的性能并优化为最优。

image

Figure 3: Task Dictionary. Outputs of 24 (of 26) task-specific networks for a query (top left). See results of applying frame-wise on a video here.

图3:任务词典。查询的24个(26个)任务特定网络的输出(左上)。在这里查看在视频中应用帧结果的结果。

A vision task is an abstraction read from a raw image. We denote a task t more formally as a function
image

which maps image I to
image
. Our dataset, D, contains for each task t a set of training pairs
image
, e.g.
image

.

视觉任务是从原始图像读取的抽象。我们更正式地将任务t表示为映射I到
image

的函数
image
。我们的数据集D为每个任务t包含一组训练对
image
,例如,
image

image

Figure 4: Transfer Function. A small readout function is trained to map representations of source task’s frozen encoder to target task’s labels. If order> 1, transfer function receives representations from multiple sources.

图4:传递函数。训练一个小读数函数,将源任务的冻结编码器的表示映射到目标任务的标签。如果订单> 1,传递函数接收来自多个来源的表示。

Task Dictionary: Our mapping of task space is done via (26) tasks included in the dictionary, so we ensure they cover common themes in computer vision (2D, 3D, semantics, etc) to the elucidate fine-grained structures of task space. See Fig. 3 for some of the tasks with detailed definition of each task provided in the supplementary material. We include tasks with various levels of abstraction, ranging from solvable by a simple kernel convolved over the image (e.g. edge detection) to tasks requiring basic understanding of scene geometry (e.g. vanishing points) and more abstract ones involving semantics (e.g. scene classification).

任务词典:我们通过字典中包含的任务完成任务空间的映射,因此我们确保它们涵盖计算机视觉(2D,3D,语义等)中的常见主题,以阐明任务空间的细化结构。有关详细定义补充材料中提供的每项任务的一些任务,请参见图3。我们包括具有各种抽象级别的任务,从对图像(例如边缘检测)进行卷积的简单内核可解算到需要对场景几何(例如消失点)有基本理解的任务和涉及语义(例如场景分类)的更抽象的任务。

It is critical to note the task dictionary is meant to be a sampled set, not an exhaustive list, from a denser space of all conceivable visual tasks/abstractions. Sampling gives us a tractable way to sparsely model a dense space, and the hypothesis is that (subject to a proper sampling) the derived model should generalize to out-of-dictionary tasks. The more regular / better sampled the space, the better the generalization. We evaluate this in Sec. 4.2 with supportive results. For evaluation of the robustness of results w.r.t the choice of dictionary, see the supplementary material.

需要注意的是,任务词典应该是所有可以想象的视觉任务/抽象的密集空间中的抽样集,而不是详尽的列表。抽样为我们提供了一种容易的方法来稀疏地模拟密集空间,并且假设(受适当抽样),派生模型应推广到超字典任务。空间越规则/越好,泛化越好。我们在第二部分对此进行评估。 4.2有支持的结果。为了评估结果的稳健性,字典的选择,请参阅补充材料。

Dataset: We need a dataset that has annotations for every task on every image. Training all of our tasks on exactly the same pixels eliminates the possibility that the observed transferabilities are affected by different input data peculiarities rather than only task intrinsics. There has not been such a dataset of scale made of real images, so we created a dataset of 4 million images of indoor scenes from about 600 buildings; every image has an annotation for every task. The images are registered on and aligned with buildingwide meshes similar to [3, 101, 14] enabling us to programmatically compute the ground truth for many tasks without human labeling. For the tasks that still require labels (e.g. scene classes), we generate them using Knowledge Distillation [43] from known methods [104, 57, 56, 78]. See the supplementary material for full details of the process and a user study on the final quality of labels generated using Knowledge Distillation (showing < 7% error).

数据集:我们需要一个数据集,每个图像上的每个任务都有注释。在完全相同的像素上训练我们所有的任务消除了观察到的可转换性受到不同输入数据特性影响的可能性,而不仅仅是任务内在因素。目前还没有这样的真实图像尺度数据集,因此我们创建了一个来自约600座建筑物的400万幅室内场景图像数据集;每个图像都有对每个任务的注释。这些图像在类似于[3,101,14]的建筑物网格上进行注册和对齐,使我们能够以编程方式计算许多任务的地面实况,而无需人工标注。对于仍然需要标签的任务(例如场景类),我们使用已知方法[104,57,56,78]的知识蒸馏[43]来生成它们。请参阅补充材料了解过程的全部细节,以及使用Knowledge Distillation生成的标签的最终质量的用户研究(显示误差<7%)。

3.1. Step I: Task-Specific Modeling

3.1。第一步:任务特定建模

We train a fully supervised task-specific network for each task in S. Task-specific networks have an encoderdecoder architecture homogeneous across all tasks, where the encoder is large enough to extract powerful representations, and the decoder is large enough to achieve a good performance but is much smaller than the encoder.

我们为S中的每个任务训练完全监督的任务特定网络。任务特定的网络在所有任务中具有均匀的编码器解码器架构,其中编码器足够大以提取强大的表示,并且解码器足够大以实现良好的性能,但比编码器小得多。

image

Figure 5: Transfer results to normals (upper) and 2.5D Segmentation (lower) from 5 different source tasks. The spread in transferability among different sources is apparent, with reshading among top-performing ones in this case. Task-specific networks were trained on 60x more data. “Scratch” was trained from scratch without transfer learning.

图5:从5个不同的源任务将结果传输到法线(上)和2.5D分割(下)。不同来源之间可转移性的差异是显而易见的,在这种情况下,重新转化为表现最佳的之一。任务特定的网络在60倍以上的数据上进行了培训。 “划痕”是从零开始训练的,没有转移学习。

3.2. Step II: Transfer Modeling

3.2。第二步:传输建模

Given a source task s and a target task t, where
image

and
image
, a transfer network learns a small readout function for t given a statistic computed for s (see Fig 4). The statistic is the representation for image I from the encoder of s:
image
. The readout function (
image

) is parameterized by
image
minimizing the loss
image
:

给定一个源任务和一个目标任务t,其中
image

image
,一个传输网络学习一个小的读出函数,给定一个为s计算的统计量(见图4)。统计量是来自s:
image
编码器的图像I的表示。读出功能(
image

)由
image
参数化,最大限度地减少
image
的损耗:
image

where
image

is ground truth of t for image I.
image
may or may not be sufficient for solving t depending on the relation between t and s (examples in Fig. 5). Thus, the performance of
image
is a useful metric as task affinity. We train transfer functions for all feasible source-target combinations.
image

是图像I的t的基本事实。根据t和s之间的关系,
image

可能足够也可能不足以解决t(图5中的例子)。因此,
image
的性能是一项非常有用的衡量指标。我们训练所有可行的源 - 目标组合的传递函数。

Accessibility: For a transfer to be successful, the latent representation of the source should both be inclusive of sufficient information for solving the target and have the information accessible, i.e. easily extractable (otherwise, the raw image or its compression based representations would be optimal). Thus, it is crucial for us to adopt a low-capacity (small) architecture as transfer function trained with a small amount of data, in order to measure transferability conditioned on being highly accessible. We use a shallow fully convolutional network and train it with little data (8x to 120x less than task-specific networks).

可访问性:为了成功转移,源的潜在表示应该包含用于解决目标的足够信息,并且具有可访问的信息,即易于提取(否则,原始图像或其基于压缩的表示将是最优的) 。因此,对于我们来说,采用低容量(小型)体系结构作为用少量数据进行培训的传递函数是非常重要的,以测量可高度访问的可转移性。我们使用一个浅层完全卷积网络,并用很少的数据训练它(比任务特定的网络少8倍到120倍)。

Higher-Order Transfers: Multiple source tasks can contain complementary information for solving a target task (see examples in Fig 6). We include higher-order transfers which are the same as first order but receive multiple representations in the input. Thus, our transfers are functions
image

, where ℘ is the powerset operator.

高阶传输:多源任务可以包含补充信息以解决目标任务(请参见图6中的示例)。我们包含与第一阶相同的高阶传输,但在输入中接收多个表示。因此,我们的转账功能是
image

,其中℘是权力机构运营商。

As there is a combinatorial explosion in the number of feasible higher-order transfers (
image

for
image
order), we employ a sampling procedure with the goal of filtering out higher-order transfers that are less likely to yield good results, without training them. We use a beam search: for transfers of order
image
to a target, we select its 5 best sources (according to
image

order performances) and include all of their order-k combination. For
image
, we use a beam of size 1 and compute the transfer from the top k sources. Transitive Transfers: We examined if transitive task transfers (s → t1 → t2) could improve the performance over their direct counterpart (
image
), but found that the two had equal performance in almost all cases in both highdata and low-data scenarios. The experiment is provided in the supplementary material. Therefore, we need not consider the cases where branching would be more than one level deep when searching for the optimal transfer path.

由于可行的高阶转移数量(
image

image
阶次)存在组合式爆炸,因此我们采用抽样程序,目标是筛选出不太可能产生良好结果的高阶转移,而无需对其进行培训。我们使用波束搜索:为了将
image
命令传输到目标,我们选择其5个最佳源(根据
image

命令性能)并包括它们的所有顺序-k组合。对于
image
,我们使用大小为1的光束并计算来自顶部k个光源的传输。传递性传递:我们研究了传递性任务传递(s→t1→t2)是否可以改善其直接对应性(
image
)的性能,但发现在高数据和低数据情况下,两者在几乎所有情况下都具有相同的性能。实验在补充材料中提供。因此,我们无需考虑在寻找最佳传输路径时分支深度超过一级的情况。
image

Figure 6: Higher-Order Transfers. Representations can contain complementary information. E.g. by transferring simultaneously from 3D Edges and Curvature individual stairs were brought out. See our publicly available interactive transfer visualization page for more examples.

图6:高阶传输。陈述可以包含补充信息。例如。通过从3D边缘和曲率同时转移个别楼梯被带出。有关更多示例,请参阅我们公开的交互式传输可视化页

3.3. Step III: Ordinal Normalization using Analytic Hierarchy Process (AHP)

3.3。第三步:使用层次分析法(AHP)进行序数标准化

We want to have an affinity matrix of transferabilities across tasks. Aggregating the raw losses/evaluations
image

from transfer functions into a matrix is obviously problematic as they have vastly different scales and live in different spaces (see Fig. 7-left). Hence, a proper normalization is needed. A naive solution would be to linearly rescale each row of the matrix to the range
image
. This approach fails when the actual output quality increases at different speeds w.r.t. the loss. As the loss-quality curve is generally unknown, such approaches to normalization are ineffective. Instead, we use an ordinal approach in which the output quality and loss are only assumed to change monotonically. For each t, we construct
image
a pairwise tournament matrix between all feasible sources for transferring to t. The element at
image

is the percentage of images in a held-out test set,
image
, on which
image
transfered to t better than
image
did (i.e.
image
).

我们希望在任务间有一个可传递的友好矩阵。将原始损失/评估
image

从传递函数集中到矩阵中显然是有问题的,因为它们具有完全不同的尺度并且生活在不同的空间中(参见图7左侧)。因此,需要适当的标准化。一种天真的解决方案是将矩阵的每一行线性重新缩放到
image
范围内。当实际输出质量以不同的速度增加时,这种方法失败w.r.t.亏损。由于损失质量曲线一般是未知的,所以这种正常化的方法是无效的。相反,我们使用一种序数方法,其中输出质量和损失只被假设为单调变化。对于每个t,我们构造
image
成为所有可行源之间的成对比赛矩阵,以转移到t。
image

中的元素是
image
中保留测试集中映像的百分比,
image
转换为比
image
更好的映像(即
image
)。

We clip this intermediate pairwise matrix
image

to be in
image
as a form of Laplace smoothing. Then we divide
image
so that the matrix shows how many times better
image

is compared to
image
. The final tournament ratio matrix is positive reciprocal with each element
image
of
image
:

我们将这个中间成对矩阵
image

作为拉普拉斯平滑的一种形式嵌入到
image
中。然后我们划分
image
,以便矩阵显示
image

image
相比的好几倍。最终的锦标赛比率矩阵是正互惠的,
image
image
的每个元素都是正数:
image

We quantify the final transferability of
image

to t as the cor responding (
image
) component of the principal eigenvector of
image
(normalized to sum to 1). The elements of the principal eigenvector are a measure of centrality, and are proportional to the amount of time that an infinite-length random walk on
image

will spend at any given source [62]. We stack the principal eigenvectors of
image
for all
image
, to get an affinity matrix P (‘p’ for performance)—see Fig. 7, right.

我们将
image

的最终转移量定量为
image
的主要特征向量(归一化为总和为1)的相应(
image
)分量。主特征向量的元素是中心性的度量,并且与
image

上无限长随机游走将在任何给定源消耗的时间量成比例[62]。我们将所有
image
image
的主要特征向量堆叠起来,得到一个亲和矩阵P('p'用于表现) - 见图7,右边。
image

Figure 7: First-order task affinity matrix before (left) and after (right) Analytic Hierarchy Process (AHP) normalization. Lower means better transfered. For visualization, we use standard affinity-distance method dist
image

(where
image
and e is element-wise matrix exponential). See supplementary material for the full matrix with higher-order transfers.

图7:层次分析过程(AHP)规范化之前(左)和之后(右)的一阶任务亲和度矩阵。越低意味着转移越好。对于可视化,我们使用标准亲和度 - 距离法dist
image

(其中
image
和e是元素明智的矩阵指数)。请参阅更高阶传输的完整矩阵的补充材料。

This approach is derived from Analytic Hierarchy Process [79], a method widely used in operations research to create a total order based on multiple pairwise comparisons.

这种方法来源于层次分析法[79],这是一种在运筹学中广泛使用的方法,用于基于多对比较来创建总体订单。

3.4. Step IV: Computing the Global Taxonomy

3.4。第四步:计算全球分类

Given the normalized task affinity matrix, we need to devise a global transfer policy which maximizes collective performance across all tasks, while minimizing the used supervision. This problem can be formulated as subgraph selection where tasks are nodes and transfers are edges. The optimal subgraph picks the ideal source nodes and the best edges from these sources to targets while satisfying that the number of source nodes does not exceed the supervision budget. We solve this subgraph selection problem using Boolean Integer Programming (BIP), described below, which can be solved optimally and efficiently [41, 16].

鉴于规范化的任务亲和力矩阵,我们需要制定全球转移政策,以最大限度地提高所有任务的集体绩效,同时尽量减少所用的监督。这个问题可以表示为子图选择,其中任务是节点,传输是边。最佳子图选择理想源节点和从这些源到目标的最佳边缘,同时满足源节点数量不超过监督预算。我们使用下面描述的布尔整数规划(BIP)来解决这个子图选择问题,它可以被优化和有效地解决[41,16]。

Now we add three types of constraints via matrix A to enforce each feasible solution of the BIP instance corresponds to a valid subgraph for our transfer learning problem: Constraint I: if a transfer is included in the subgraph, all of its source nodes/tasks must be included too, Constraint II: each target task has exactly one transfer in, Constraint III: supervision budget is not exceeded.

现在我们通过矩阵A添加三种约束来强制BIP实例的每个可行解对应于我们的转移学习问题的有效子图:约束I:如果转移包含在子图中,则其所有源节点/任务必须也包括在内,约束二:每个目标任务恰好有一个转移进来,约束三:监督预算不超过。

Constraint I: For each row
image

in A we require ai · x ≤ bi, where
image
sources
image
if
image

image
if
image
sources
image
(4)
image
otherwise

约束I:对于A中的每一行
image

,我们都需要ai·x≤bi,其中
image
image
image

image
image
image
(4)
image
image

Constraint II: Via the row
image

, we enforce that each target has exactly one transfer:
image
target
image
(6)

约束II:通过行
image

,我们强制每个目标只有一个传输:
image
目标
image
(6)

Constraint III: the solution is enforced to not exceed the budget. Each transfer i is assigned a label cost
image

, so
image
(7)

约束III:解决方案被强制执行,不超过预算。
image

(7)每次转让都分配了标签费用
image

The elements of A not defined above are set to 0. The problem is now a valid BIP and can be optimally solved in a fraction of a second [41]. The BIP solution
image

corresponds to the optimal subgraph, which is our taxonomy.

上面没有定义的A的元素被设置为0.现在问题是一个有效的BIP,并且可以在几分之一秒内被最优解决[41]。BIP解决方案
image

对应于最佳子图,这是我们的分类。

文章引用于 http://tongtianta.site/paper/1750
编辑 Lornatang
校准 Lornatang

你可能感兴趣的:(Taskonomy: Disentangling Task Transfer Learning翻译[下])