2020年cvpr
I review in this article, papers published at CVPR 2020 about Continual Learning. If you think I made a mistake or missed an important paper, please tell me!
我在本文中回顾了在CVPR 2020上发表的有关持续学习的论文。 如果您认为我犯了一个错误或错过了重要论文,请告诉我!
1.少拍 (1. Few-Shots)
Many papers this year in Continual Learning were about few-shot learning. Besides the CVPR papers I’ll present, there is also a workshop paper (Cognitively-Inspired Model for Incremental Learning Using a Few Examples, Ayub et al. CVPR Workshop 2020) and an arXiv (Defining Benchmarks for Continual Few-Shot Learning, Antoniou et al. arxiv:2004.11967).
今年,《持续学习》中有许多论文都是关于“少拍即学”的。 除了我将介绍的CVPR论文之外,还有一个工作坊论文( 使用少量示例的认知启发式增量学习模型,Ayub等人的CVPR Workshop 2020 )和arXiv( 定义连续不断学习的基准,安东尼奥)等人,arxiv:2004.11967 )。
1.1。 少量课堂增量学习 (1.1. Few-Shot Class-Incremental Learning)
PDF: 2004.10956Authors: Xiaoyu Tao, Xiaopeng Hong, Xinyuan Chang, Songlin Dong, Xing Wei, Yihong Gong
PDF : 2004.10956 作者 :陶小雨, 洪小鹏 ,常新元, 董松林 ,邢伟,龚一红
Tao et al. propose in this paper a mix between Few-Shot and Continual Learning. They benchmark their model (TOPIC) on CIFAR100, miniImageNet, and CUB200. The first task is large (60 classes for CIFAR100), then the following tasks have few classes (5 ‘n-way’) and few training samples per new class (5 ‘k-shots’).
陶等。 在本文中提出“少儿学习”与“持续学习”之间的混合。 他们在CIFAR100,miniImageNet和CUB200上对模型(TOPIC)进行基准测试。 第一个任务很大(CIFAR100为60个班级),随后的任务有几个班级(5个“ n-路”),每个新班级的培训样本很少(5个“ k-shot”)。
The author uses Neural Gas (Martinetz et al. 1991) and Competitive Hebbian Learning (Martinetz, 1993). This neural network seems similar to Self-Organizing Maps.
作者使用神经气体 ( Martinetz等,1991 )和竞争性黑边学习 ( Martinetz,1993 )。 这个神经网络似乎类似于自组织图。
The Neural Gas is an undirected graph, where each node $j$, is defined as $(m_j, \Lambda_j, z_j, c_j)$: — $m_j$ is a centroid vector, similar to what we can expect after a K-Means (kNN for CL) — $\Lambda_j$ is the variance matrix of each dimension of the vector — $z_j$ and $c_j$ are respectively the assigned images and labels
神经气体是无向图,其中每个节点$ j $被定义为$(m_j,\ Lambda_j,z_j,c_j)$:— $ m_j $是质心向量,类似于我们在K-平均值(CL的kNN)-$ \ Lambda_j $是向量每个维度的方差矩阵-$ z_j $和$ c_j $分别是分配的图像和标签
The graph is created after the first task, once the feature extractor has been trained. They sample by random 400 images among the training samples and use their extracted features as initial centroids.
一旦训练了特征提取器,便在第一个任务之后创建图形。 他们从训练样本中随机抽取400张图像进行采样,并将其提取的特征用作初始质心。
Nodes are updated by some kind of moving average (Eq.3 of the paper), where all centroids move towards a current sample. The move in the latent space is weighted by a learning rate and more importantly by the L2 distance rank: close features will affect more the centroid than far features.
节点通过某种移动平均值(本文的方程3)进行更新,其中所有质心都向当前样本移动。 潜伏空间中的移动受学习率加权,更重要的是受L2距离等级加权:较远的要素对质心的影响更大。
To create edges between nodes, they use Competitive Hebbian Learning. Hebbian Learning in Neuroscience stipulates that:
为了在节点之间创建边缘,他们使用竞争性Hebbian学习。 神经科学中的赫本学习规定:
Neurons that fire together, wire together.
发射的神经元相互连接。
In this case, if two nodes are respectively the closest and second closest nodes to an input features, an edge is created. The edge has an age that is incremented each time no “firing together” happened. Past an age threshold, the edge is removed.
在这种情况下,如果两个节点分别是与输入要素最接近的节点和第二最接近的节点,则会创建一条边。 每次没有“一起射击”时,边缘的年龄就会增加。 超过年龄阈值后,边缘将被移除。
The model is trained with a usual softmax+cross-entropy but the competitive Hebbian learning is done after each step. Note that the latter doesn’t produce gradients for backpropagation.
该模型使用通常的softmax +交叉熵进行训练,但是竞争性的Hebbian学习是在每个步骤之后进行的。 请注意,后者不会产生用于反向传播的渐变。
Finally, the inference is not really explained but I guess that each node has a “label” by voting which training sample labels were most associated with. Given a test input sample, its label is determined by the label of its closest node.
最后,推论并没有得到真正的解释,但是我猜想每个节点都有一个“标签”,可以通过投票确定与哪个训练样本标签最相关。 给定一个测试输入样本,其标签由其最近节点的标签确定。
They stabilize the training by constraining the centroids to stay close to their previous position with a loss they called “anchor loss”. It’s actually a Mahalanobis loss, where the distance is weighted per dimension with the inverse of the variance (i.e. precision).
他们通过限制质心使其保持在先前位置附近来稳定训练,这种损失被称为“锚固损失”。 这实际上是马氏距离损失,其中距离是按维度加权的,且与方差的倒数(即精度)成反比。
They also add a regularization called “min-max loss” to separate new centroids (added with new tasks) from previous centroids. It is similar to the hinge loss used by Hou et al. 2019.
他们还添加了一个称为“最小-最大损失”的正则化函数,以将新质心(添加了新任务)与先前质心分开。 这类似于侯等人使用的铰链损耗。 2019年
1.2。 增量式少量射击对象检测 (1.2. Incremental Few-Shot Object Detection)
PDF: 2003.04668Authors: Juan-Manuel Perez-Rua, Xiatian Zhu, Timothy Hospedales, Tao Xiang
PDF : 2003.04668 作者 :Juan-Manuel Perez-Rua,朱夏田,蒂莫西·霍斯佩达莱斯,陶翔
Perez-Rua et al. propose to use together three settings: Continual Learning, Object Detection (aka finding object boxes in an image), and Few-Shot (with few training samples).
Perez-Rua等。 建议一起使用以下三种设置:持续学习,对象检测(也就是在图像中查找对象框)和少拍(训练样本很少)。
Their setting is made of a first large task, with many classes & data, then the following tasks add new classes with only 10 labeled examples per class (10 k-shots). This is impressive because object detection is harder than classification!
他们的设置是由第一个大型任务完成的,它具有许多类和数据,随后的任务添加了新类,每个类仅带有10个带标签的示例(10 k张)。 这是令人印象深刻的,因为对象检测比分类难!
During the first task, they train a CenterNet. Similar to CornerNet, class-agnostic features are extracted by a ResNet then class-specific heatmaps are generated. The most active zones are chosen to be boxes’ centers. Two additional heads regress the boxes’ width and height.
在第一个任务中,他们训练CenterNet 。 与CornerNet类似,由ResNet提取与类无关的功能,然后生成特定于类的热图。 选择最活跃的区域作为盒子的中心。 两个额外的头使盒子的宽度和高度减小。
Once their CenterNet has been trained on the base classes, they train a Meta-Learning-based generator. This module must learn to produce the “class-codes”, aka the class-specific weights used by the detector. To do so, they train the generator on the base classes split into episodes. All weights are frozen except the generator’s.
一旦对他们的CenterNet进行了基础班的培训,他们就会培训一个基于元学习的生成器。 该模块必须学会产生“分类代码”,也就是检测器使用的特定于分类的权重。 为此,他们在分成几集的基类上训练生成器。 除发电机外,所有砝码均被冻结。
For the following tasks, there is no training. Given a new class, the meta-generator produces on-the-fly new weights, and the inference is immediately done.
对于以下任务,没有培训。 给定一个新的类,元生成器会即时生成新的权重,并立即进行推断。
It’s worth noting that the results on novel classes are quite low:
值得注意的是,新颖类的成绩很低:
2.用于任务感知的持续学习的条件通道门控网络 (2. Conditional Channel Gated Networks for Task-Aware Continual Learning)
PDF: 2004.00070Authors: Davide Abati, Jakub Tomczak, Tijmen Blankevoort, Simone Calderara, Rita Cucchiara, Babak Ehteshami Bejnordi
PDF : 2004.00070 作者 :Davide Abati,Jakub Tomczak,Tijmen Blankevoort,Simone Calderara,Rita Cucchiara,Babak Ehteshami Bejnordi
Abati et al. propose an interesting view of sub-networks in Continual Learning. Previous methods proposed to learn sub-networks inside a unique network (think Lottery Ticket Hypothesis (Frankle & Carbin, 2018)). Those sub-networks can be learned by an evolutionary algorithm (Fernando et al., 2017), L1 sparsity (Golkar et al., 2019), or learned gating (Hung et al., 2019). However they all have a major constraint: they need the task id in inference to choose the right sub-networks, a setting called Multi-Head Evaluation. Having the task id in inference makes the problem much easier, and I think it is not realistic.
Abati等。 提出一个关于持续学习中子网的有趣观点。 以前的方法建议学习唯一网络内部的子网络(想想彩票假说( Frankle&Carbin,2018 ))。 可以通过进化算法( Fernando等人,2017 ),L1稀疏性( Golkar等人,2019 )或学习门控( Hung等人,2019 )来学习那些子网。 但是,它们都有一个主要的约束条件:他们需要推理任务ID来选择正确的子网,该设置称为“多头评估”。 通过推断任务ID可以使问题容易得多,我认为这是不现实的。
The authors propose to train sub-networks with learned gating. The right sub-network will be chosen inference with a Task Classifier, therefore they don’t use the task id! To the best of my knowledge, they are only the second to do this (von Oswald et al. 2020).
作者建议用学习过的门控来训练子网。 正确的子网将通过任务分类器进行推断,因此它们不使用任务ID! 据我所知,他们只是这样做的第二位( von Oswald等人,2020年 )。
Each residual block of their ResNet has $T$ (number of tasks) gate networks that choose the which filters to enable or disable:
ResNet的每个剩余块都有$ T $(任务数)门网络,这些门网络选择要启用或禁用的过滤器:
The selection of filters to pass or block is discrete and thus non-differentiable. Thus they use the Gumbel Softmax Sampling similarly to (Guo et al. 2019). The forward pass is discrete, but the backward pass is continuous. After each task, they record on a validation set which gates have fired. Their associated filters will be frozen for the following tasks, but still usable!
通过或阻止的滤波器选择是离散的,因此是不可微分的。 因此,他们类似于( Guo et al.2019)使用Gumbel Softmax采样。 前向通过是离散的,但后向通过是连续的。 完成每个任务后,他们会在验证集上记录触发了哪些门。 它们的关联过滤器将冻结用于以下任务,但仍可用!
The interesting part of this paper is how they train the task classifier. During training, all gates are fired in parallel:
本文有趣的部分是他们如何训练任务分类器。 在训练过程中,所有大门同时开火:
This means that given a single input image, they have $T$ parallel stream of activations. All those streams’ outputs are concatenated and fed to a task classifier (a two layers MLP) that classifies the task id. This task id will then be used to chose the right stream to give to the task-specific classifier.
这意味着,给定单个输入图像,它们具有$ T $并行的激活流。 将所有这些流的输出连接起来,并馈送到对任务ID进行分类的任务分类器(两层MLP)。 然后,此任务ID将用于选择正确的流,以提供给特定于任务的分类器。
I really like this method, however, it’s unfortunate that they don’t compare their model to the lastest SotA and on large-scale datasets. Their largest dataset is ImageNet-50. Furthermore, I would like to see which gates fire the most, early layers or later layers?…
我真的很喜欢这种方法,但是不幸的是,他们没有将其模型与最新的SotA和大规模数据集进行比较。 他们最大的数据集是ImageNet-50。 此外,我想看看哪个门触发得最多,最早期还是最晚?
3.在线场景中的增量学习 (3. Incremental Learning in Online Scenario)
PDF: 2003.13191Authors: Jiangpeng He, Runyu Mao, Zeman Shao, Fengqing Zhu
PDF : 2003.13191 作者 :何江鹏,毛润玉,邵泽满,朱凤清
He et al. claim to learn in an “Online Scenario”: new classes are added as usual, with also old classes new samples. This sounds similar to the New Instances and Classes (NIC) ( Lomonaco et al., 2017). They claim novelty while it’s not really true (on top of my head, Aljundi et al. and Lomonaco et al. have worked on this).
他等。 声称要在“在线方案”中学习:像往常一样添加了新类,还为旧类添加了新样本。 这听起来类似于新实例和类 (NIC)( Lomonaco等人,2017 )。 他们声称新颖性并不是真的(在我的头上,Aljundi等人和Lomonaco等人对此进行了研究)。
The authors propose two contributions: first of all they use at first a Nereast Class Mean (NCM) (kNN for CL) classifier. Similar to iCaRL’s NME, class means are computed. To handle concept drift, they update the means for new samples with a moving average:
作者提出了两个建议:首先,他们首先使用Nereast Class Mean (NCM)(CL的kNN)分类器。 类似于iCaRL的NME,将计算类别均值。 为了处理概念漂移,他们使用移动平均值更新了新样本的均值:
Furthermore, during the early tasks, they use in inference their NCM classifier because they remark that it behaves well with data scarcity. When the model is better trained, having seen enough samples, they switch their classifier for the class probabilities from a softmax. It’s interesting to mix those inference classifiers (classification and metric-based) contrary to Hou et al., 2019 and Douillard et al., 2020 that evaluated separately the two methods. However He et al. ‘s switch from one method to the other is an hyperparameter, it would have been nice to do so based on some uncertainty measure.
此外,在早期任务中,他们使用NCM分类器进行推断,因为他们指出该分类器在数据缺乏的情况下表现良好。 当模型经过更好的训练后,看到足够多的样本,他们便将分类概率从softmax切换到分类概率。 与那些分别评估这两种方法的Hou等人,2019和Douillard等人,2020相反,将那些推理分类器(分类和基于度量)混合在一起很有趣。 但是他等。 从一种方法切换到另一种方法是一个超参数,基于某种不确定性度量,这样做会很不错。
Their second contribution is a modified cross-distillation: in addition of an usual distillation loss on probabilities with temperature (Hinton et al., 2015), they modify the classification loss. It is still a cross-entropy but the probabilities associated to old classes are a linear combination of the old and new model outputs:
他们的第二个贡献是改进的交叉蒸馏 :除了通常的随温度变化的蒸馏损失( Hinton等,2015 )外,他们还修改了分类损失。 它仍然是一个交叉熵,但是与旧类相关联的概率是新旧模型输出的线性组合:
They also finetune their model on a balanced set like Castro et al., 2018 did.
他们还像Castro等人(2018年)那样在平衡的条件下微调模型。
Finally, they evaluate their model on CIFAR100, ImageNet100, and Food-101. Unfortunately, they evaluate their “Online” NIC setting only on Food-101 and solely compare against a simple baseline, not any SotA models (adapted to the setting). CIFAR100 and ImageNet100 are evaluated in the classic NC setting. They have slightly better performance than SotA on the former and are equivalent to BiC on the latter. I’m quite annoyed that they claimed in the abstract to “out-perform” SotA on ImageNet1000 while they actually only evaluate on the smaller-scale ImageNet100. BiC really shines on ImageNet1000, I’d have liked to see how their model fares in this harsher dataset.
最后,他们在CIFAR100,ImageNet100和Food-101上评估其模型。 不幸的是,他们仅在Food-101上评估其“在线” NIC设置,仅与简单的基准进行比较,而不是与任何适用于该设置的SotA模型进行比较。 CIFAR100和ImageNet100在经典NC设置中评估。 与前者相比,它们的性能略好于SotA,后者则与BiC相当。 我很生气,他们抽象地声称在ImageNet1000上“优于” SotA,而实际上他们仅在较小规模的ImageNet100上进行评估。 BiC确实在ImageNet1000上大放异彩,我很想看看他们的模型在这个更苛刻的数据集中的表现如何。
4. iTAML:与任务无关的增量式元学习 (4. iTAML: An Incremental Task-Agnostic Meta-learning)
PDF: 2003.11652Authors: Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Mubarak Shah
PDF格式 : 2003.11652 作者 :Jathushan Rajasegaran,Salman Khan,Munawar Hayat,Fahad Shahbaz Khan,Mubarak Shah
Rajasegaran et al. propose to a novel Meta-Learning model. Several approaches exist using this branch of methods for Continual Learning, they are however the first to the best of my knowledge to do so without task id in inference.
Rajasegaran等。 提出一种新颖的元学习模型。 存在使用这种方法进行持续学习的几种方法,但是,据我所知,它们是第一个在没有任务ID推断的情况下进行学习的方法。
The goal of Meta-Learning is to “learn to learn”. In this case, they want to obtain a network that learns to produce a set of parameters that can then be tuned quickly to a particular task. Following existing Meta-Learning models, they train in two loops. The inner loop learns an actual task, while the outer loop learns to produce a good initialization for the inner loop (i.e. “learn to learn”). Inspired by Reptile ( Nichol et al., 2018), their outer loop is trained on the difference between the base parameters and the parameters learned after the inner loop. The different with Reptile is that the inner loop learns separately each task, producing mean inner loop parameters:
元学习的目的是“学习学习”。 在这种情况下,他们希望获得一个学习生成一组参数的网络,然后可以将这些参数快速调整至特定任务。 遵循现有的元学习模型,它们分为两个循环进行训练。 内部循环学习实际的任务,而外部循环学习为内部循环产生良好的初始化(即“学习学习”)。 受爬行动物( Nichol et al。,2018 )的启发,他们的外循环是根据基本参数与内循环后学习的参数之间的差异进行训练的。 Reptile与Reptile的不同之处在于,内部循环分别学习每个任务,从而产生平均的内部循环参数:
This forces the meta-model to find a set of parameters good for all tasks:
这会强制元模型找到适合所有任务的一组参数:
During inference, they proceed in two steps: finding the task id, and then tuning the model for this task. To find the task id, they record for the test subset which task predictions are the most activated in average, and choose the maximum value:
在推论过程中,它们分两步进行:查找任务ID,然后为该任务调整模型。 为了找到任务ID,他们为测试子集记录平均最活跃的任务预测,然后选择最大值:
Then, given the predicted task id, they sample all exemplars memory (aka Rehearsal Learning) belonging to this task and learn in a single inner loop task-specific parameters. The resulting model is then used to classify the test samples.
然后,根据给定的预测任务ID,他们对属于该任务的所有示例记忆(即彩排学习)进行采样,并在单个内环任务特定参数中进行学习。 然后将所得模型用于对测试样本进行分类。
Note that while it’s a very interesting approach, their model stands on the quality of the task classification. They claim it’s almost 100% accuracy but it only for a reason: in their setting, the model is evaluated on each seen task separately. In a real setting, samples from different tasks are mixed together. There their algorithm 2 won’t work. Therefore I don’t think this model is truly “task-agnostic” but it’s definitely a good step forward.
请注意,尽管这是一种非常有趣的方法,但他们的模型基于任务分类的质量。 他们声称这几乎是100%的准确性,但这仅出于一个原因:在他们的设置中,将分别对每个看到的任务评估模型。 在实际环境中,将来自不同任务的样本混合在一起。 在那里,他们的算法2无法正常工作。 因此,我认为该模型并不是真正的“任务不可知”,但这绝对是向前迈出的一大步。
They evaluate various models, from meta-learning to continual learning domains, on MNIST, SVHN, CIFAR100, ImageNet100, ImageNet1000, and Celeb-10k. It’s a bit strange however that BiC (a very good alternative for large scale datasets) is evaluated on Celeb-10k but not ImageNet100 (where it would have beaten iTAML).
他们评估了MNIST,SVHN,CIFAR100,ImageNet100,ImageNet1000和Celeb-10k上从元学习到持续学习领域的各种模型。 但是,BiC(对于大型数据集的一种很好的替代方法)是在Celeb-10k上进行评估的,但不是ImageNet100(它会击败iTAML)进行评估的,这有点奇怪。
5.建模增量学习的背景 (5. Modeling the Background for Incremental Learning)
PDF: 2002.00718Authors: Fabio Cermelli, Massimiliano Mancini, Samuel Rota Bulò, Elisa Ricci, Barbara Caputo
PDF : 2002.00718 作者 :Fabio Cermelli,Massimiliano Mancini,Samuel RotaBulò,Elisa Ricci,Barbara Caputo
Cermelli et al. attacks the problem of Semantic Segmentation for Continual Learning. In semantic segmentation, the goal is to give a class to each pixel. Two cars next to each other will have the same pixels labels. This is particularly difficult in Continual Learning for the same reasons as Object Detection: at task $t$, the class “car” may be a background, but then at task $t+1$ we have to predict it. However, our model may have seen images containing cars in the first task and thus has learned to not detect those. Likewise, images from task $t+1$ may be annotated so that the “person” learned previously is now part of the background.
Cermelli等。 攻击了用于持续学习的语义分割问题。 在语义分割中,目标是给每个像素一个类。 彼此相邻的两辆汽车将具有相同的像素标签。 出于与对象检测相同的原因,这在持续学习中尤其困难:在任务$ t $中,“汽车”类别可能是背景,但是在任务$ t + 1 $中,我们必须对其进行预测。 但是,我们的模型可能在第一个任务中已经看到包含汽车的图像,因此学会了不检测这些图像。 同样,可以注释来自任务$ t + 1 $的图像,以便以前学习的“人”现在成为背景的一部分。
To solve the problem mentionned before, they revisit both the cross-entropy and distillation losses. The former is split in two part: if the pixel probability belongs to the current task’s set of classes it is kept unchanged. Otherwise, it is the probability of having either an old class or the background:
为了解决前面提到的问题,他们重新讨论了交叉熵和蒸馏损失。 前者分为两部分:如果像素概率属于当前任务的类集,则保持不变。 否则,这就是拥有旧班级或背景的可能性:
Likewise, the distillation loss is changed if the pixel belongs to the background, according to the current task, to the probability of having either a new class or the background:
同样,根据当前任务,如果像素属于背景,则蒸馏损失也会更改为具有新类别或背景的概率:
This handles the case where the previous model considers a current pixel as background while the current model considers it as a new class.
这处理了先前模型将当前像素视为背景而当前模型将其视为新类的情况。
Finally, the classifier weights for new classes are initialized with the background weight.
最后,使用背景权重初始化新类的分类器权重。
6. ADINET:用于视网膜图像分类的属性驱动增量网络 (6. ADINET: Attribute Drive Incremental Network for Retina Image Classification)
PDF: CVPR webpageAuthors: Qier Meng, Satoh Shin’ichi
PDF : CVPR网页 作者 :孟绮儿,佐藤信一
Meng and Shin’ichi use Continual Learning for retinal images classification. They found that retinal diseases have a large variety of types and that current methods didn’t allow them to train incrementally each type a new patient came in.
Meng和Shin'ichi使用持续学习对视网膜图像进行分类。 他们发现,视网膜疾病的类型多种多样,目前的方法不允许他们逐步训练新患者进入的每种类型。
Their originality lies in their usage of attributes. Retinal images have been annotated with a disease label (“AMD”, “DR”…) but also with several attributes (“hermorrhage”, “macular edema”…). Therefore in addition to the classic distillation loss with temperature scaling applied to the disease prediction, they also distill the attributes prediction of the previous model with a BCE.
它们的独创性在于对属性的使用。 视网膜图像标注有疾病标签(“ AMD”,“ DR”…),但也具有多种属性(“出血”,“黄斑水肿”……)。 因此,除了经典的具有温度缩放功能的蒸馏损失应用于疾病预测外,他们还使用BCE提取了先前模型的属性预测。
In addition to the two distillation losses, they also refine their attributes prediction with a “weight estimation”. It measures how much of a contribution an attribute has to distinguish classes. It’s similar to doing a self-attention on all attributes to find which one is the most important. This weight estimation is then used to ponder the attribute predictions. They didn’t detail much the rationale behind this weight estimation but empirical results show small but consistent gains.
除了两次蒸馏损失外,他们还通过“权重估计”改进了属性预测。 它度量属性区分类的贡献量。 这类似于对所有属性进行自我关注,以找出最重要的属性。 然后,使用此权重估计来考虑属性预测。 他们没有详细说明此权重估算的基本原理,但经验结果表明收益很小但始终如一。
They evaluate both medical and academic datasets. For the later they used ImageNet-150k-sub: it contains 100 classes of ImageNet1000, but only 150 training images were selected per class instead of ~1200. I’ve never seen a model evaluated on this dataset, but it looks like a more challenging dataset than ImageNet100. They display a significant improvement over the 2017’s iCaRL.
他们评估医学和学术数据集。 对于后来的他们使用ImageNet-150k-sub:它包含100个类别的ImageNet1000,但是每个类别仅选择了150个训练图像,而不是大约1200。 我从未见过对此数据集进行过评估的模型,但它看起来比ImageNet100更具挑战性。 与2017年的iCaRL相比,它们具有显着的进步。
It’s interesting to predict attributes in the context of Continual Learning. I hypothesize that it forces the model to learn fine-grained features common to all tasks and may reduce catastrophic forgetting.
在持续学习的背景下预测属性很有趣。 我假设这会迫使模型学习所有任务共有的细粒度特征,并可能减少灾难性的遗忘。
7.课堂增量学习的语义漂移补偿 (7. Semantic Drift Compensation for Class-Incremental Learning)
PDF: 2004.00440Authors: Lu Yu, Bartłomiej Twardowski, Xialei Liu, Luis Herranz, Kai Wang, Yongmei Cheng, Shangling Jui, Joost van de Weijer
PDF : 2004.00440 作者 :陆宇,BartłomiejTwardowski,刘夏蕾,路易斯·赫兰兹,王凯,成永美,隋尚龄,Joost van de Weijer
Rebuffi et al., 2017 uses class means with a k-NN to classify samples in inference. Those class means are updated after each task by re-extracting features of rehearsal samples of old classes with the new ConvNet. This supposes that we have access to previous data, at least in a limited amount.
Rebuffi等人(2017)使用具有k-NN的分类均值对推理中的样本进行分类。 通过使用新的ConvNet重新提取旧类的排练样本的功能,可以在每次任务后更新这些类的平均值。 这假设我们至少可以访问有限数量的先前数据。
Yu et al. propose to update the class means without even using previous data. First, they compute the embedding drift between the start and the end of the current task on current data:
Yu等。 建议甚至不使用以前的数据来更新类方法。 首先,他们计算当前数据在当前任务的开始和结束之间的嵌入漂移:
Then for each old classes, they compute the mean vector of drift:
然后,对于每个旧类,它们计算漂移的平均向量:
The drift is weighted by $w_i$, which gives a lower weight to outliers. Thus samples will low confidence won’t affect as much the drift computation than “archetypal” samples.
漂移由$ w_i $加权,这使异常值的权重降低。 因此,与“原型”样本相比,低置信度样本不会对漂移计算产生太大影响。
Finally this drift is computed at after each task, starting from the second one, and is added continuously to the class mean vectors:
最后,从第二个任务开始,在每个任务之后计算该漂移,并将其连续添加到类均值向量中:
They add their method, nicknamed SDC, to the model EWC. They show on CIFAR100 and ImageNet100 excellent performance only beaten by Hou et al., 2019. It’s important to note that Hou et al., 2019 use exemplars while Yu et al. don’t. On the other hand, according to their code they are in a Multi-Head Evaluation setting where they know the task id during inference. Thus they classify a sample among the task classes instead of all seen classes as do Rebuffi et al., 2017 or Hou et al., 2019. Their setting is not really comparable to Hou et al., 2019.
他们将其昵称为SDC的方法添加到EWC模型中。 他们在CIFAR100和ImageNet100上显示出只有Hou等人在2019年击败过的出色性能。 重要的是要注意, 侯(Hou)等人,2019使用示例,而于(Yu)等人 ,2019使用示例。 别。 另一方面,根据他们的代码,他们处于“多头评估”设置中,在推理过程中他们知道任务ID。 因此,他们将任务类别中的样本分类,而不是像Rebuffi等人,2017或Hou等人,2019那样对所有可见类别进行分类 。 它们的设置与Hou等人(2019)的确不具有可比性。
8.在课堂增量学习中保持歧视和公平 (8. Maintaining Discrimination and Fairness in Class Incremental Learning)
PDF: 1911.07053Authors: Bowen Zhao, Xi Xiao, Guojun Gan, Bin Zhang, Shutao Xia
PDF : 1911.07053 作者 : 赵博文 ,肖曦, 甘国军 ,张斌,夏书涛
Zhao et al. propose a method directly in the line of Belouadah and Popescu, 2019) (IL2M) and Wu et al., 2019) (BiC). Both works remarked that a bias towards new classes and detrimental to old classes. IL2M uses some statistics to correct this bias, while BiC recalibrates (Recalibration) the probabilities using a linear model learned on validation data.
赵等。 在Belouadah和Popescu,2019 )(IL2M)和Wu等人,2019 )(BiC)中直接提出一种方法。 这两部作品都指出,偏向新阶级,不利于旧阶级。 IL2M使用一些统计数据来纠正此偏差,而BiC使用在验证数据上学习的线性模型重新校准(重新校准)概率。
Zhao et al. use a simpler solution that they call Weight Aligning (WA). They saw, as Hou et al., 2019, that the norm of the weights associated with old classes is lower than those associated with new classes:
赵等。 使用他们称为重量调整 (WA)的简单解决方案。 正如Hou等人一样,他们在2019年发现 ,与旧班级相关的权重规范低于与新班级相关的权重规范:
Hou et al., 2019 and Douillard et al., 2020 use a cosine classifier so that all norms are equal to 1. Zhao et al. instead re-normalize the weights based on the norm ratio:
Hou等人(2019)和Douillard等人(2020)使用余弦分类器,因此所有范数均等于1 。 而是根据范数比对权重进行重新归一化:
They remark that:
他们指出:
[…] we only make the average norms become equal, in other words, within new classes (or old classes), the relative magnitude of the norms of the weight vectors does not change. Such a design is mainly used to ensure the data within new classes (or old classes) can be separated well.
[…]我们仅使平均范数变得相等,换句话说,在新类别(或旧类别)内,权重向量的范数的相对大小不会改变。 这种设计主要用于确保新类(或旧类)中的数据可以很好地分离。
This weight alignment is done between each task. Furthermore, they clip after each optimization step the value of the weights to be positive to make the weights norm more consistent with their corresponding logits (after ReLU).
在每个任务之间完成此权重调整。 此外,他们在每个优化步骤之后将权重的值裁剪为正,以使权重范数与其对应的对数更加一致(在ReLU之后)。
In their ablation, they show that weight alignment provides more gain than knowledge distillation which is quite impressive. Using both lead them to a significant gain over BiC and IL2M on large scale datasets like ImageNet1000.
在消融过程中,他们显示出重量对准比知识蒸馏提供了更多的收益,这是非常令人印象深刻的。 在像ImageNet1000这样的大规模数据集上,使用这两者会使它们比BiC和IL2M显着提高。
Overall I liked their method as it is very simple and yet efficient.
总的来说,我很喜欢他们的方法,因为它非常简单而且有效。
9.助记符训练:多级增量学习而不会忘记 (9. Mnemonics Training: Multi-Class Incremental Learning without Forgetting)
PDF: 2002.10211Authors: Yaoyao Liu, An-An Liu, Yuting Su, Bernt Schiele, Qianru Sun
PDF : 2002.10211 作者 : 刘瑶瑶,刘安安,苏雨婷 ,伯恩特·席勒,孙倩如
Liu et al. propose in this work two important contributions which make it my favorite paper in this review. The first, advertised, is an improvement of Rehearsal Learning. The second, a little hidden in the paper, is Meta-Learning-inspired method to adapt gracefully to new distribution.
刘等。 在这项工作中提出了两个重要的贡献,这使它成为本评论中我最喜欢的论文。 首先,广告宣传是对排练学习的改进。 第二种是隐藏在论文中的方法,它是受Meta-Learning启发的方法,用于优雅地适应新发行版。
In rehearsal learning, we feed old samples to the model to reduce forgetting. Obviously, we won’t use all old samples, but a very limited amount. Here the authors use 20 images per class, like Hou et al., 2019 and Douillard et al., 2020. Rebuffi et al., 2017 proposed with iCaRL to a herding selection which finds iteratively the barycenter of the class distribution. However Castro et al., 2018 remarked that taking the closest samples to the class mean, or even random samples (!), worked as well.
在排练学习中,我们将旧样本输入模型,以减少遗忘。 显然,我们不会使用所有旧样本,但是数量非常有限。 在这里,作者每课使用20张图像,例如Hou等,2019和Douillard等,2020 。 Rebuffi等人(2017年)建议使用iCaRL进行羊群选择,以迭代方式找到类别分布的重心。 但是Castro等人在2018年指出,将最接近的样本接近均值,甚至是随机样本(!)也可以。
Liu et al. significantly improve those solutions by transforming the selected exemplars. First, they randomly select samples, then given a trained & fixed model, they optimize the exemplars as the pixel-level:
刘等。 通过变换选定的示例显着改善那些解决方案。 首先,他们随机选择样本,然后给定训练有素的固定模型,他们将示例优化为像素级:
The optimized exemplars must lead to a decreased loss on the new classes data (present in large amounts).
优化的样本必须减少新类数据(大量出现)上的损失。
The modification is very minor visually: we only see a little bit of noise overlayed on the original images. The authors found that this optimization of the images leads to a set of exemplars well distributed on the class boundaries:
外观上的修改很小:我们只看到一点点噪声叠加在原始图像上。 作者发现,图像的这种优化导致了一组在类边界上分布良好的示例:
This optimization is done at the task end, once exemplars from new classes have been selected. The authors also choose to finetune the exemplars of old classes that have been selected in previous tasks. However, in this case, we don’t have anymore a large amount of old classes’ data to act as ground truth for the old classes exemplars optimization. Therefore, they split the exemplars set in half. One split is optimized using the second for ground truth and vice-versa.
一旦选择了新类的示例,就可以在任务结束时完成此优化。 作者还选择微调以前任务中选择的旧类的示例。 但是,在这种情况下,我们不再有大量的旧类数据来作为旧类示例优化的基础。 因此,他们将样本集分为两半。 使用第二部分针对地面真实情况进行优化,反之亦然。
The second contribution, and the major one, is unfortunately not very advertised in this paper. The authors re-use an idea from one of their previous papers in Meta-Learning. Instead of tuning all the ConvNet parameters for each task, they only slightly adapt them: for each kernel, a small kernel of spatial dimensions equal to one is learned (likewise for the biases). This small kernel is expanded to the base kernel dimension and element-wise multiplied to it:
不幸的是,本文的第二点,也是主要的贡献,并没有得到很好的宣传。 作者重用了他们先前在Meta-Learning中发表的一篇论文中的一个想法。 他们没有为每个任务调整所有ConvNet参数,而是仅对其稍作调整:对于每个内核,将学习一个空间尺寸等于1的小内核(同样对于偏差)。 这个小内核被扩展为基本内核维,并逐个元素地相乘:
Intuitively, this method called “Meta-Transfer” does a small “shift & scaling”. Most of the network is kept frozen and thus don’t forget too much. The small adaptation enables the network to learn new classes.
直观地讲,这种称为“元转移”的方法会进行少量的“移位和缩放”。 大多数网络保持冻结状态,因此不要忘记太多。 较小的适应性使网络能够学习新的课程。
They evaluated CIFAR100, ImageNet100, and ImageNet1000 in various settings and beat all previous SotA (especially Hou et al., 2019 and Wu et al., 2019). In my recent paper ( Douillard et al., 2020), our model PODNet outperforms their’s in several settings and evaluate even on even longer incremental training.
他们在各种设置下评估了CIFAR100,ImageNet100和ImageNet1000,并击败了之前所有的SotA(尤其是Hou等,2019和Wu等,2019 )。 在我最近的论文中( Douillard等人,2020年 ),我们的模型PODNet在多个环境中均优于其PODNet,甚至在更长的增量训练中也能进行评估。
10.向后兼容的表示学习 (10. Towards Backward-Compatible Representation Learning)
PDF: 2003.11942Authors: Yantao Shen, Yuanjun Xiong, Wei Xia, Stefano Soatto
PDF : 2003.11942 作者 :沉艳涛,熊元俊,夏伟,Stefano Soatto
This paper is not directly related to Continual Learning but rather to Visual Search. Shen et al. raise the issue of backfilling:
本文与持续学习没有直接关系,而是与视觉搜索有关。 沉等。 提出回填的问题:
On a visual search model, embeddings of a large gallery have been computed once. Then given a query image, we extract its features and compare them to the gallery features collection. For example, I take a picture of a shirt and want to know what are the most similar clothes available in a store.
在视觉搜索模型上,大型图库的嵌入已被计算一次。 然后给定一个查询图像,我们提取其特征并将其与图库特征集合进行比较。 例如,我拍了一件衬衫的照片,想知道商店中最相似的衣服是什么。
A problem arises when a new model is trained. This model may be different from the previous one because the data is was trained on or because the architecture and losses were changed. The gallery features collection is not up-to-date anymore and we need to extract gain the features of the whole collection with the new model. This can be very costly when the gallery is made of billions of images.
训练新模型时会出现问题。 该模型可能与以前的模型有所不同,这是因为对数据进行了训练,或者由于更改了架构和损失。 画廊功能集合不再是最新的,我们需要使用新模型来提取获得整个集合的功能。 当图库由数十亿张图片组成时,这可能会非常昂贵。
The authors propose to make the features extractor “backward-compatible”. It’s mean that query features extracted by the new model are in the same latent space of the old model:
作者建议使特征提取器“向后兼容”。 这意味着新模型提取的查询特征与旧模型位于相同的潜在空间中:
To produce a new model backward-compatible with a previous model that may be different, the authors add a loss over the classification loss:
为了产生与先前模型可能向后兼容的新模型,作者在分类损失上增加了损失:
The first part of the loss update the new model on new $t$ dataset $T_{\text{new}}$. For the second part, the best alternative proposed is training the old classifier $w_{c\,\text{old}}$ with the new features extractor $w_\theta$ on the new dataset $T_\text{new} = T_\text{BCT}$. Because the new dataset can contain new classes, the old classifier is extended with new weights. They are initialized with the mean features extracted by $w_{\theta\, \text{old}}$ like Weight Imprinting did in Metric-Learning ( Qi et al., 2018).
损失的第一部分在新的$ t $数据集$ T _ {\ text {new}} $上更新了新模型。 对于第二部分,建议的最佳替代方法是在新数据集$ T_ \ text {new} = T_上使用新的特征提取器$ w_ \ theta $训练旧分类器$ w_ {c \,\ text {old}} $ \ text {BCT} $。 因为新的数据集可以包含新的类,所以旧的分类器将使用新的权重进行扩展。 它们使用$ w _ {\ theta \,\ text {old}} $提取的均值特征进行初始化,就像Metric-Learning中的权重印记所做的那样( Qi等人,2018年 )。
Overall, their model is still far from the upper-bound (recomputing the gallery with the new model) but they improve significantly over simple baselines and beat LwF ( Li & Hoiem, 2016) by 3 points. I think this model is quite “simple” compared to SotA Continual Learning but it is interesting to see actual applications of the domain.
总体而言,他们的模型还远远没有达到上限(使用新模型重新计算画廊),但它们在简单的基准基础上有显着改善,并击败了LwF( Li&Hoiem,2016年 )3分。 与SotA持续学习相比,我认为该模型非常“简单”,但有趣的是该领域的实际应用。
I work as an industrial PhD at Heuritech and Sorbonne University. In the former, a Parisian startup, we develop Deep Learning models for vision to understand the wide complexity of fashion images on the internet. We face many challenges such as domain adaption, open set, weak supervision, and of course continual learning. We recently published a paper on this topic: PODNet, that achives great results on large amount of small tasks. Follow us on Medium and check out our website heuritech.com!
我在Heuritech和Sorbonne大学担任工业博士学位。 在前一家巴黎初创公司中,我们开发了深度学习模型以实现视觉,以了解互联网上时尚图像的广泛复杂性。 我们面临许多挑战,例如领域适应,开放式学习,薄弱的监督以及当然不断学习。 我们最近发表了有关该主题的论文: PODNet ,该论文在处理大量小任务上取得了不错的成绩。 在Medium上关注我们,并查看我们的网站heuritech.com !
翻译自: https://medium.com/heuritech/continual-learning-at-cvpr-2020-a6408e9c51f4
2020年cvpr