元学习:Meta Learning on a Sequence of Imbalanced Domains with Difficulty Awareness(一)

      https://github.com/ joey-wang123/Imbalancemeta.git .

     以上是作者给出的原论文的验证的代码。这篇顶会论文在ICCV 2021上发表。原顶会论文可在知网等检索网站检索。论文的名字为Meta Learning on a Sequence of Imbalanced Domains with Difficulty Awareness可会意为“一类难以理解的不平衡域的元学习”也可理解为“具有困难意识的非平衡域序列的元学习”。这篇顶会文章的质量非常高,我也是花了一星期的时间进行深入理解。下面我们首先简单了解一下元学习的概念:

     元学习是目前人工智能领域中一个令人振奋的研究方向。随着大量研究论文的发表和研究进展的取得,元学习在人工智能领域取得了重大突破。在开始探讨元学习之前,先来了解一下当前 的人工智能模型的工作原理。

深度学习真的是真正的人工智能吗?答案是否定的。我们人类是如何学习的呢?我们将学到的东西归纳为多个概念并从中学习。不过目前的学习算法只能处理一项任务。这就是元学习的用武之地。元学习能够生成一个通用的人工智能模型来学习执行各种任务,而无须从零开始训练它们。我们可以用很少的数据点来训练元学习模型去完成各种相关的任务,因此对于一个新任务,元学习模型可以利用之前从相关任务中获得的知识,无须从零开始训练。许多研究人员和科学家认为,元学习可以让我们更接近 AGI。因此元学习又被称为“学会学习”。对于元学习的算法,例如孪生网络(siamese network)、原型网络(prototypical network)、关系网络(relationnetwork)和记忆增强网络(memory-augmented network),并在 TensorFlow 与Keras中实现它们;了解先进的元学习算法,如模型无关元学习(model-agnostic meta learningMAML)、Reptile 和元学习的上下文适应(context adaptation via meta learningCAML);探索如何使用元随机梯度下降法(meta stochastic gradient descentMeta-SGD)来快速学习,以及如何使用元学习来进行无监督学习不是本文的重点。本文主要阐述这篇顶会论文的思想与实现。

Recognizing new objects by learning from a few labeled examples in an evolving environment is crucial to obtain excellent generalization ability for real-world machine learning systems. A typical setting across current meta learning algorithms assumes a stationary task distribution during meta training. In this paper, we explore a more practi cal and challenging setting where task distribution changes over time with domain shift. Particularly, we consider re alistic scenarios where task distribution is highly imbal anced with domain labels unavailable in nature. We pro pose a kernel-based method for domain change detection and a difficulty-aware memory management mechanism that jointly considers the imbalanced domain size and domain
importance to learn across domains continuously. Furthermore, we introduce an efficient adaptive task sampling method during meta training, which significantly reduces task gradient variance with theoretical guarantees. Finally, we propose a challenging benchmark with imbalanced do
main sequences and varied domain difficulty. We have performed extensive evaluations on the proposed benchmark, demonstrating the effectiveness of our method.







A series of mini-batch training tasks T 1 , T 2 , . . . , T N arrive sequentially, with possible domain shift occurring in the stream, i.e., the task stream can be segmented by continual la tent domains, D 1 , D 2 , . . . , D L . T t denotes the mini-batch of tasks arrived at time t . The domain identity associated with each task remains unavailable during both meta training and testing. Domain boundaries, i.e., indicating current domain has finished and the next domain is about to start, are un known. This is a more practical and general setup. Each task T is divided into training and testing data {T train , T test } . Suppose T t train consists of K examples, { ( x k , y k ) } K k =1 ,where in object recognition, x k is the image data and y k is the corresponding object label. We assume the agent stays within each domain for some consecutive time. Also, we consider a simplified setting where the agent will not re turn back to previous domains and put the contrary case into future work. Our proposed learning system maintains a memory buffer M to store a small number of training tasks from previous domains for replay to avoid forgetting of pre vious knowledge. Old tasks are not revisited during training unless they are stored in the memory M . The total num ber of tasks processed is much larger than memory capacity. At the end of meta training, we randomly sample a large number of unseen few-shot tasks from each latent domain, D 1 , D 2 , . . . , D L for meta testing. The model performance is the average accuracy on all the sampled tasks.

元学习:Meta Learning on a Sequence of Imbalanced Domains with Difficulty Awareness(一)_第1张图片



(2) 以前域中的任务通常是在新域上进行培训时不可用;

(3) 号码每个领域的任务数量可能高度不平衡;

  (4)    不同领域的难度在性质上可能存在显著差异域序列。


(1) 在网上学习结构域序列;

(2) 任务流包含重要的域名大小不平衡;

3) 域标签和边界在培训和测试期间仍然不可用;

(4) 主要困难在于跨域序列的非均匀性。









Reservoir sampling (RS) [ 58 , 15 ] is a random samplingmethod for choosing k samples from a data stream in a single pass without knowing the actual value of total number of items in advance. Straightforward adoption of RS here is to maintain a fixed memory and uniformly sample tasks from the task stream. Each task in the stream is assigned equal probability n /N of being moved into the memory buffer, where n is the memory capacity size and N is the total number of tasks seen so far. However, it is not suitable for the practical scenarios previously described, with two major shortcomings: (1) the task distribution in memory can be skewed when the input task stream is highly imbalanced in our setting. This leads to under-representation of the minority domains; (2) the importance of each task varies as some domains are more difficult to learn than others. This factor is also not taken into account with RS.
元学习:Meta Learning on a Sequence of Imbalanced Domains with Difficulty Awareness(一)_第2张图片


(1) 很少有射击任务在同一时间内具有高度多样性领域

(2) 在域中存在不同程度的变化跨越层序的边界。


Projected space
Tasks Tt are mapped into a commonspace where K is the number of training data and f θ t is the CNN embed ding network. The task embedding could be further re fined by incorporating the image labels, e.g., concatenating the word embedding of the image categories with image embedding. We leave this direction as interesting future work. To reduce the variance across different few shot tasks and capture the general domain information, we compute the exponential moving average of task embedding O t as O t = α o t + (1 α ) O t 1 , where the constant α is the weighting multiplier which encodes the relative importance between current task embedding and past moving average. A sliding window stores the past m ( m is a small number)steps moving average, O t 1 , O t 2 , · · · , O t m , which are
used to form the low dimensional projection vector z t , where the i -th dimensional element of z t is the distance between o t and O t i , d ( o t , O t i ) . The projected m dimensional vector z t captures longer context similarity information spanning across multiple consecutive tasks.
Online domain change detection At each time t , we utilize the above constructed projected space for online domain change detection. Assume we have two win dows of projected embedding of previous tasks U B ={ z t 2 B , z t 2 B +1 , · · · , z t B 1 } with distribution Q and V B = { z t B , z t B +1 , · · · , z t } with distribution R , where B is the window size. In other words, V B represents the most recent window of projection space (test window) and U B represents the projection space of previous window (ref erence window). U B and V B are non-overlapping windows.For notation clarity and presentation convenience, we use another notation to denote the U B = { u 1 , u 2 , · · · , u B } and V B = { v 1 , v 2 , · · · , v B } , i.e., u i = z t 2 B + i 1 and v i = z t B + i 1 . Our general framework is to first measure the distance between the two distributions Q and R , d ( Q, R ) ; then, by setting a threshold b , the domain change is detected when d ( Q, R ) > b . Here, we use Maximum Mean Discrep ancy (MMD) to measure the distribution distance. Following [ 38 ], the MMD distance between Q and R is defined as:
元学习:Meta Learning on a Sequence of Imbalanced Domains with Difficulty Awareness(一)_第3张图片


where k ( · , · ) is RKHS kernel. In this paper, we assume RBF kernel k ( x, x ) = exp ( −|| x x || 2 / 2 σ 2 ) is used. The detection statistics at time t is W tB . If Q and R are close, W tB is expected to be small, implying small proba bility of existence of domain change. If Q and R are significantly different distributions, W tB is expected to be large,implying higher chance of domain shift. Thus, W tB char acterizes the chance of domain shift at time t . We then test on the condition of W tB > b to determine whether domain change occurs, where b is a threshold. Each task T t is asso ciated with a latent domain label L t , L 0 = 0 . If W tB > b , L t = L t 1 + 1 , i.e., a new domain arrives (Note that the actual domain changes could happen a few steps ago, but for simplicity, we could assume domain changes occur at time t ); otherwise, L t = L t 1 , i.e., the current domain continues. We leave the more general case with domain revisiting as future work. How to set the threshold is a non-trivial task and is described in the following.
Setting the threshold Clearly, setting the threshold b involves a trade-off between two aspects: (1) the probability of W tB > b when there is no domain change; (2) the probability of W tB > b when there is domain change. As a result, if the domain similarity and difficulty vary significantly, simply setting a fixed threshold across the entire training process is highly insufficient. In other words, adaptive threshold of b is necessary. Before we present the adaptive threshold method,
we first show the theorem which characterizes the property of detection statistics W tB in the following.





