【Active Learning - 01】 2013_CVPR_Adaptive Active Learning for Image Classification 论文笔记


论文地址:2013_CVPR_Adaptive Active Learning for Image Classification



We aim to develop an effective active learning method to build a competitive classifier with a limited amount of labeled training instances.


These works however merely evaluate the informativeness of instances with most uncertainty measures, which assume an instance with higher classification uncertainty is more critical to label. Although the most uncertainty measures are effective on selecting informative instances in many scenarios, they only capture the relationship of the candidate instance with the current classification model and fail to take the data distribution information contained in the unlabeled data into account. This may lead to selecting non-useful instances to label. For example, an outlier can be most uncertain to classify, but useless to label. This suggests representativeness of the candidate instance in addition to the classification uncertainty should be considered in developing an active learning strategy.


we propose a novel adaptive active learning strategy that exploits information provided by both the labeled instances and the unlabeled instances for query selection. Our new query selection measure is an adaptive combination of two terms: an uncertainty term based on the current classifier trained on the labeled instances; and an information density term that measures the mutual information between the candidate instance and the remaining unlabeled instances.

Related Work: 

这个部分主要是介绍的传统的选择方法,在【Active Learning - 00】 主动学习概念/重要论文目录/重要代码&资源中基本上有提到,可自行阅读,这里不细说。


Therefore these methods have the drawback of ignoring the distributional information contained in the large number of unlabeled instances, as we discussed above. In this paper, we develop a new active learning method for image classification tasks, which overcomes the inherent limitation of uncertainty sampling.



(1)Uncertainty Measure:


(2)Information Density Measure:

目的就是在于解决uncertainty sampling的缺点,于是在进行查询时考虑未标注样本集。这个做法的动机是:使用代表性的样本构成输入,对于提高目标分类器的泛化性能非常informative。虽然输入分布没有给出,但是大量的未标注样本可以近似表达输入空间。以往的semi-supervised学习证实,未标注样本的分布对于训练分类器非常有帮助。

根据直觉,应当选择那些位于密集区域的样本(an instance will be much more informative about other unlabeled instances)而不是那些位于稀疏区域的(the ones located in a sparse region)。我们使用information density来表示某个候选样本对剩下的未标注样本的信息量。本文使用Gaussian Process framework来定义信息密度(即候选样本和剩下未标注样本集的相互信息mutual information)。

mutual information用于衡量两个变量间的相互独立性,比marginal density p(x)更加直观,也比cosine distance information density更加合理。mutual information可以定义为:


这个熵H( )如何计算呢?文章使用的一个高斯过程,其相当于表示在一组(可能无穷大)随机变量的联合分布。因此文中对于每一个候选样本x引入了一个随机变量X(x),因此使用一个对称正定核函数 K(·, ·) 去生成一个协方差矩阵,因此σi2 = K(xi, xi):

(3)A Combination Framework:

这一部分是对(1)的(2)指标进行组合,满足“对于当前分类器是最不确定的”,又要满足“相对剩下的未标注样本是非常富有信息量的”的样本将会被进行选择。这样的话,将这些候选样本加入labeled set能够使的分类器在未标注样本集上获得更高的准确率。因此这个组合指标可以写作:

尽管uncertatinty term f(x)是判别性指标,而信息密度指标d(xi)1-β是根据输入空间进行计算,并且跟目标判别分类器模型没有直接的联系。但也能够去选择那些具有信息量的样本,同时减少泛化误差而且不增加额外计算代价。

(4)Adaptive Combination



Experimental Results:


 1.场景分类数据集(总共 3859 张图像)

2.目标识别: (1)Caltech-101 (2)Pascal VOC 2007


(1) Random Sampling

(2) Most Uncertainty

(3) Near Optimal

(4) Fixed Combination: 使用cosine distance衡量information density,使用固定参数β = {0.25, 0.5, 0.75, 1}

(5) Proposed Apporach 



(1)Uncertainty measure

(2)information density


自我总结:这篇文章的contributions主要是两个点,一个是加入了information density引入了未标注样本集的分布信息;另一个是用了自适应的β求算公式,再每一次迭代中都可以获得最优的查询样本。





