Robust Object Tracking via Sparsity-based Collaborative Model
基于稀疏性协同模型的鲁棒目标跟踪
Abstract——摘要
In this paper we propose a robust object tracking algorithm using a
collaborative model.
本文提出了一种使用协同模型的鲁棒跟踪算法。
As the main challenge for object tracking is to account for drastic appearance change, we propose a robust appearance model that exploits both holistic templates and local representations.
目标跟踪的主要挑战是考虑剧烈外观变换,我们提出了一种使用整体模版和局部表示相结合的鲁棒外观模型。
We develop a sparsity-based discriminative classifier (SDC) and a sparsity-based generative model (SGM).
我们开发了基于稀疏性的判别分类器(SDC)和基于稀疏性的生成模型(SGM)。
In the SDC module, we introduce an effective method to compute the confidence value that assigns more weights to the foreground than the background.
在SDC模块,我们引入了有效计算信任值的方法,这种方法分配前景更高权重。
In the SGM module, we propose a novel histogram-based method that takes the spatial information of each patch into consideration with an occlusion handing scheme.
在SGM模块,我们提出了一种新颖的基于直方图的方法,这种方法考虑了每个图像块的空间信息和遮挡处理的方案。
Furthermore, the update scheme considers both the latest observations and the original template, thereby enabling the tracker to deal with appearance change effectively and alleviate the drift problem.
此外,更新方案考虑了最新的观测和原始的模版,因此,使跟踪器能有效处理外观变化和减少漂移问题。
Numerous experiments on various challenging videos demonstrate that the proposed tracker performs favorably against several state-of-the-art algorithms.
各种挑战性视频上的大量实验验证了提出的跟踪器比一些当前算法性能更优。
1. Introduction——引言
段落1——目标跟踪定义、应用、存在问题
The goal of object tracking is to estimate the states of the target in image sequences.
目标跟踪的目标是估计目标在图像序列中的状态。
It plays a critical role in numerous vision applications such as motion analysis, activity recognition, video surveillance and traffic monitoring.
目标跟踪在许多视觉应用中扮演着一个关键性的角色,例如运动分析,动作识别,视频监控和交通监控。
While much progress has been made in recent years, it is still a challenging problem to develop a robust algorithm for complex and dynamic scenes due to large appearance change caused by varying illumination, camera motion, occlusions, pose variation and shape deformation (See Figure1).
虽然在近些年取得了许多进步,但是由于光照变化、摄像头运动、遮挡、姿态变化和形状变换引起大的外观变化,开发一种对复杂动态场景鲁棒的方法仍然是一个具有挑战性的问题。
Reference——常见跟踪方法Frag[1],IVT[21],MIL[4],L-1[19],PN[12],VTD[13]
[1] A. Adam, E. Rivlin, and I. Shimshoni. Robust fragments-based tracking using the integral histogram. In CVPR, 2006.
[21] D. Ross, J. Lim, R.-S. Lin, and M.-H.Yang. Incremental learning for robust visual tracking. IJCV, 77(1-3):125–141, 2008.
[4] B. Babenko, M.-H. Yang, and S. Belongie. Visual tracking with on-line multiple instance learning. In CVPR, 2009.
[19] X. Mei and H. Ling. Robust visual tracking using L-1
minimization. In ICCV, 2009.
[12] Z. Kalal, J. Matas, and K. Mikolajczyk. P-N learning: Bootstrapping binary classifiersby structural constraints. In CVPR, 2010.
[13] J. Kwon and K. M. Lee. Visual tracking decomposition. In CVPR, 2010.
图1包括严重遮挡(caviar)、旋转(panda)、光照变化(shaking)和杂乱背景等挑战性的环境中的跟踪。Frag[1],IVT[21],MIL[4],L-1[19],PN[12],VTD[13]跟踪方法和我们的跟踪方法分别由青色(cyan)、蓝色(blue)、品红色(magenta)、绿色(green)、黑色(black)、黄色(yellow)和红色方框各自表示。
段落2——目标跟踪外观模型
In a fixed frame, an appearance model is used to represent the object with proper features and verify predictions using object representations. 在确定帧中,外观模型用合适的特征表示目标并用目标表示验证预测。
In the successive frames, a motion model is applied to predict the likely state of an object (e.g., Kalman filter[6]and particle filter[20, 14]).
Reference——用卡尔曼滤波和粒子滤波进行跟踪
[6] D. Comaniciu, V. R. Member, and P. Meer. Kernel-based object
tracking. PAMI, 25(5):564–575, 2003.
[20] P.P′Color-based probaerez, C. Hue, J. Vermaak, and M. Gangnet. bilistic tracking. In ECCV, 2002.
[14] Y. Li, H. Ai, T. Yamashita, S. Lao, and M. Kawade. Tracking in low frame rate video: a cascade particle filter with discriminative observers of different life spans. PAMI, 30(10):1728–1740, 2008.
在连续帧中,用运动模型预测目标的可能状态。
In this paper, we focus on the appearance model since it is usually the most crucial component of any tracking algorithm.
本文主要关注外观模型因为它通常是任何跟踪算法的最重要部分。
段落3——外观模型的特征选择
Several factors need to be considered for an effective appearance model.
有效的外观模型要考虑一些因素。
First, an object can be represented by different features such as intensity[21], color[20], texture[3], superpixels [25], and Haar-like features [10, 11, 4, 12].
Reference——表示目标的不同特征
[21] D. Ross, J. Lim, R.-S. Lin, and M.-H.Yang. Incremental learning for robust visual tracking. IJCV, 77(1-3):125–141, 2008.
[20] P.P′Color-based probaerez, C. Hue, J. Vermaak, and M. Gangnet. bilistic tracking. In ECCV, 2002.
[3] S.Avidan. Ensemble tracking. PAMI, 29(2):261–271, 2007.
[25] S. Wang, H. Lu, F. Yang, and M.-H. Yang. Superpixel tracking. In ICCV, 2011.
[10] H. Grabner and H. Bischof. On-line boosting and vision. In CVPR, 2006.
[11] H. Grabner, C. Leistner, and H. Bischof. Semi-supervised on-line boosting for robust tracking. In ECCV, 2008.
[4] B. Babenko, M.-H. Yang, and S. Belongie. Visual tracking with on-line multiple instance learning. In CVPR, 2009.
[12] Z. Kalal, J. Matas, and K. Mikolajczyk. P-N learning: Bootstrapping binary classifiersby structural constraints. In CVPR, 2010.
首先,目标能由不同的特征表示,例如亮度[21],颜色[20],纹理[3],超像素[25]和Haar-like特征[10,11,4,12]。
Meanwhile, the representation schemes can be based on holistic templates[6] or local histograms[1, 28].
Reference——整体模版与局部直方图
[6] D. Comaniciu, V. R. Member, and P. Meer. Kernel-based object
tracking. PAMI, 25(5):564–575, 2003.
[1] A. Adam, E. Rivlin, and I. Shimshoni. Robust fragments-based tracking using the integral histogram. In CVPR, 2006.
[28] F. Yang, H. Lu, and Y.-W. Chen. Bag of features tracking. In ICPR, 2010.
同时,表示方案是基于整体模版或局部直方图的。
In this work, we use intensity values for representation because of their simplicity and efficiency.
由于亮度值的简洁性和有效性,在本文中用亮度值表示目标。
Furthermore, our approach exploits both the strength of holistic templates to distinguish the target from the background, and the effectiveness of local patches in handling partial occlusion.
此外,我们的方法用整体模版来从背景中区分目标,用局部图像块的有效性处理部分遮挡。
段落4——生成式与判别式跟踪
Second, a model needs to be developed to verify any state prediction, which can be either generative or discriminative.
第二,需要开发模型来验证任意状态预测,模型是生成式或判别式的。
For generative methods, tracking is formulated as searching for the most similar region to the target object within a neighborhood [6, 1, 21, 19, 16, 15].
Reference——生成式跟踪
[6] D. Comaniciu, V. R. Member, and P. Meer. Kernel-based object
tracking. PAMI, 25(5):564–575, 2003.
[1] A. Adam, E. Rivlin, and I. Shimshoni. Robust fragments-based tracking using the integral histogram. In CVPR, 2006.
[21] D. Ross, J. Lim, R.-S. Lin, and M.-H.Yang. Incremental learning for robust visual tracking. IJCV, 77(1-3):125–141, 2008.
[19] X. Mei and H. Ling. Robust visual tracking using L-1
minimization. In ICCV, 2009.
[16] B. Liu, L. Yang, J. Huang, P. Meer, L. Gong, and C. Kulikowski. Robust and fast collaborative tracking with two stage sparse optimization. In ECCV, 2010.
[15] B. Liu, J. Huang, L.Yang, and C.Kulikowsk. Robust tracking using local sparse appearance model and k-selection. In CVPR, 2011.
生成式方法把跟踪归结为在近邻内搜索与目标对象最相似的区域。 For discriminative methods, tracking is treated as a binary classification problem which aims at designing a classifier to distinguish the target object from the background[2,10,3, 23, 11,4,12].
Reference——生成式跟踪
[2] S.Avidan. Supportvector tracking. PAMI, 26(8):1064–1072, 2004.
[10] H. Grabner and H. Bischof. On-line boosting and vision. In CVPR, 2006.
[3] S.Avidan. Ensemble tracking. PAMI, 29(2):261–271, 2007.
[23] F. Tang, S. Brennan, Q. Zhao, and H. Tao. Co-tracking using semi-supervised support vector machines. In ICCV, 2007.
[11] H. Grabner, C. Leistner, and H. Bischof. Semi-supervised on-line boosting for robust tracking. In ECCV, 2008.
[4] B. Babenko, M.-H. Yang, and S. Belongie. Visual tracking with on-line multiple instance learning. In CVPR, 2009.
[12] Z. Kalal, J. Matas, and K. Mikolajczyk. P-N learning: Bootstrapping binary classifiersby structural constraints. In CVPR, 2010.
判别式方法把跟踪看成一个二值分类问题,主要是设计分类器从背景中区分出目标对象。
Furthermore, several algorithms have been proposed to exploit the advantages of both generative and discriminative models [31,17, 22, 18,7].
Reference——两种算法的优点
[31] Q. Yu, T. B. Dinh, and G. G. Medioni. Online tracking and reacquisition using co-trained generative and discriminative trackers. In ECCV, 2008.
[17] R. Liu,J. Cheng, andH. Lu.Arobust boostingtracker with minimum error bound in a co-training framework. In ICCV, 2009.
[22] J. Santner, C. Leistner, A. Saffari, T. Pock, and H. Bischof. PROST: Parallel robust online simple tracking. In CVPR, 2010.
[18] H.Lu,Q.Zhou,D.Wang,andX. Ruan.Aco-training frameworkfor
visual tracking with multiple instance learning. In FG, 2011.
[7] T. B. Dinh and G. G. Medioni. Co-training framework of generative and discriminative trackers with partial occlusion handling. In Proceedings of IEEEWorkshop on Applications of ComputerVision, pages 642–649, 2011.
此外,一些提出的算法已经探索了生成式和判别式模型的优点。
We develop a simple yet robust model that makes use of the generative model to account for appearance change and the discriminative classifier to effectively separate the foreground target from the background.
我们开发了一种简单但鲁棒的模型,这种模型用生成式模型来解释外观变化并且用判别式分类器来有效区分背景和目标。
段落5——在线更新方案
The third issue is concerned with online update schemes so that the tracker can adapt to appearance variations of the target object and the background.
第三个问题是关于在线更新方案以便跟踪器能适应目标对象和背景的外观变化。
Numerous successful update approaches have been proposed[6, 10,3, 21, 19].
Reference——模版更新方法
[6] D. Comaniciu, V. R. Member, and P. Meer. Kernel-based object
tracking. PAMI, 25(5):564–575, 2003.
[10] H. Grabner and H. Bischof. On-line boosting and vision. In CVPR, 2006.
[3] S.Avidan. Ensemble tracking. PAMI, 29(2):261–271, 2007.
[21] D. Ross, J. Lim, R.-S. Lin, and M.-H.Yang. Incremental learning for robust visual tracking. IJCV, 77(1-3):125–141, 2008.
[19] X. Mei and H. Ling. Robust visual tracking using L-1
minimization. In ICCV, 2009.
许多成功的更新方法已经被提出。
However, straightforward and frequent updates of tracking results may gradually result in drifts due to accumulated errors, especially when the occlusion occurs.
然而,由于误差的累积,跟踪结果的简单频繁更新可能逐步导致漂移,尤其是遮挡发生的时候。
To address this problem, Babenko et al.[4]devise a strategy for choosing positive and negative samples during update and introduce multiple instance learning (MIL) to learn the true target object which is included in the positive bag.
Reference——多实例学习
[4] B. Babenko, M.-H. Yang, and S. Belongie. Visual tracking with on-line multiple instance learning. In CVPR, 2009.
为了解决这个问题,Babenko等人修改了更新期间的正负样本选择策略并引入了多实例学习来学习包含在正包中的真实目标对象。
Kalal et al.[12] propose a bootstrapping classifier. They explore the structure of unlabeled data via positive and negative constraints which help to select potential samples for update.
Reference——bootstrapping分类器
[12] Z. Kalal, J. Matas, and K. Mikolajczyk. P-N learning: Bootstrapping binary classifiersby structural constraints. In CVPR, 2010.
Kalal等人提出了bootstrapping分类器。通过正负约束他们探究了无标签数据的结构,这有助于选择潜在的更新样本。
In order to capture appearance variations as well as reduce tracking drifts, we propose a method that takes occlusions into consideration for updating appearance model.
为了抓住外观变化并减少跟踪漂移,我们提出了一个考虑遮挡的外观模型更新方法。
段落6——本文工作
In this paper, we propose a robust object tracking algorithm with an effective and adaptive appearance model.
在本文中,我们提出了一个有效的自适应外观模型的鲁棒目标跟踪算法。
We use intensity to generate holistic templates and local representations in each frame.
在每一帧中我们用亮度生成整体模版和局部表示。
Within our tracking scheme,the collaboration of generative models and discriminative classifiers contributes to a more flexible and robust likelihood function for particle filter.
在我们的跟踪方案中,生成模型和判别分类器的协作有助于一个更灵活和健壮的似然函数粒子滤波器。
The appearance model is adaptively updated with the consideration of occlusions to account for variations and alleviate drifts.
为了解释变化和减少漂移,外观模型是自适应更新并考虑了遮挡。
Numerous experiments on various challenging sequences show that the proposed algorithm performs favorably against the state-of-the-art methods.
在各种挑战性序列上的许多实验说明提出的算法比当前的方法性能更优。
where is composed of Np positive templates A+ and Nn negative templates A—, and K is the feature dimension before feature selection.
我们选择判别特征用公式
由Np个正模版A+和Nn个负模版A—组成,K是特征选择前的特征维度。
Each element of the vector represents the property of each template in the training set A, i.e., +1 for positive templates and -1 for negative templates.
每个向量中的元素表示训练集A中每个模版的性质,例如,+1表示正模版,-1表示负模版。
The solution of Eq. 1 is the sparse vector s, whose nonzero elements correspond to discriminative features selected from the original K-dimension feature space.
方程1的解是稀疏向量s,s中的非零元素对应从原始K维特征空间选择的判别特征。
Note that the feature selection scheme adaptively chooses suitable number of discriminative features in the dynamic environment.
注意特征选择方案在动态环境中自适应选择恰当数目的判别特征。
We project the original feature space to the selected feature space via a project matrix S.
我们将原来的特征空间通过投影矩阵S投影到选择的特征空间。
It is formed by removing all-zero rows from a diagonal matrix S′ where the elements are determined by
where the diagonal element S′is zero when si of s is zero.
S由对角阵S′去掉所有零行构成,S′中的元素由下面公式确定
当s中的si为0时对角元素S′为0。
Both the training template set and the candidates sampled by a particle filter are projected to the selected and discriminative feature space.
训练模版集和粒子滤波采样的候选目标被投影到选择判别特征空间。
Thus, the training template set and candidates in the projected space are A′= SA and x′=Sx.
因此,训练模版集和投影空间的候选目标是A′= SA和 x′=Sx。
3.2.3 Confidence Measure
The proposed SDC is developed based on the assumption that the target can be better represented by the linear combination of positive templates while the background can be better represented by the span of negative templates.
提出的SDC被开发是基于目标能由正模版的线性组合更好表示而背景能由负模版的扩展更好表示这个假设的。
Given the candidate, it is represented by the training template set with the coefficients α computed by
给定候选目标,可由训练模版集和下面公式计算出的系数α表示。
A candidate with smaller reconstruction error using the foreground template set indicates it is more likely to be a target object, and vice versa.
用前景模版集重构误差更小则候选目标表示它更可能是一个目标对象,反之亦然。
Thus, we formulate the confidence value Hc of the candidate x by
where is the reconstruction error of the candidate x with the foreground template set A+, and α+ is the corresponding sparse coefficient vector.
因此,我们构建候选目标x的信任值Hc通过下面的公式
是候选目标x跟前景模版集A+的重构误差,α+是对应的稀疏系数向量。
Similarly, is the reconstruction error of the candidate x using the background template set A_, and α_ is the related sparse coefficient vector.
类似地,是候选目标x用背景模版集A_的重构误差,α_是相应的稀疏系数向量。
The variableis fixed to be a small constant that balances the weight of the discriminative classifier and the generative model presented in Section 3.3.
变量是一个固定的小的常数,用来平衡判别分类器的权重,生成式模型在3.3节介绍。
In[27], the authors employ the reconstruction error on the target (positive) templates.
Reference——目标模版的重构误差
[27] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma. Robust face recognition via sparse representation. PAMI, 31(2):210–227, 2009.
在论文27中,作者采用目标模版的重构误差。
It is not quite appropriate for tracking, since both the negative samples and the indistinguishable samples have large reconstruction errors on the target (positive) templates.
对于跟踪它不是非常恰当的,因为负采样和不能区分的采样在目标模版上有很大的重构误差。
Thus, it introduces ambiguity for the tracker.
因此,它在跟踪器中引入了歧义性。
Our confidence measure exploits the distinction between the foreground and the background; its benefit is presented in Section 3.4.
我们用前景和背景间的差别来进行信任度量,它的好处在3.4节介绍。
3.3. Sparsity-based Generative Model (SGM)——基于稀疏性的生成式模型
Motivated by the success of sparse coding for image classification [30, 24, 9] as well as object tracking [15], we present a generative model for object representation that considers the location information of patches and takes occlusion into account.
Reference——稀疏编码在图像分类中
[30] J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid
matching using sparse coding for image classification. In CVPR,
2009.
[24] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. Locality-constrained linear coding for image classification. In CVPR, 2010.
[9] S. Gao, I. W.-H. Tsang, L.-T. Chia, and P. Zhao. Local features are not lonely -laplacian sparse coding for image classification. In CVPR, 2010.
Reference——跟踪中的稀疏编码
[15] B. Liu, J. Huang, L.Yang, and C.Kulikowsk. Robust tracking using local sparse appearance model and k-selection. In CVPR, 2011.
受稀疏编码在图像分类和目标跟踪中的成功推动,我们提出了一种生成式模型来表示对象,这种表示考虑了图像块的位置信息和遮挡。
3.3.1 Histogram Generation
For simplicity, we use the gray-scale features to represent the local information.
为简单起见,我们用灰度特征表示局部信息。
We use overlapped sliding windows on the normalized images to obtain M patches and each patch is converted to a vector, where G denotes the size of the patch.
我们在归一化的图像上用重叠的滑动窗口得到M块图像,每块图像转换成向量,G表示图像的大小。
The sparse coefficient vector β of each patch is computed by
where the dictionaryis generated from k-means cluster centers(J denotes the number of cluster centers) via the patches belonging to the labeled target object in the first frame and it consists of the most representative patterns of the target object.
字典是由K-means聚类中心经过属于第一帧标记目标的图像块生成的,其中包含目标最有表示性的模式。
In this work, the sparse coefficient vector of each patch is concatenated to form a histogram by
where is the proposed histogram for one candidate.
在本文工作中,每个图像块的稀疏系数向量连接起来形成直方图
是候选对象的直方图。
The average pooling scheme for histogram generation used in[15]is efficient, yet the strategy may miss the spatial information of each patch.
Reference——均值聚类方案
[15] B. Liu, J. Huang, L.Yang, and C.Kulikowsk. Robust tracking using local sparse appearance model and k-selection. In CVPR, 2011.
在论文15中均值聚类方案生成的直方图是有效的,但这个方案可能丢失每个图像块的空间信息。
For example, if we change the location of the left part and the right part of a human face image, the average pooling scheme neglects the exchange while our method will discover it.
例如,如果我们更变左半部分的位置和右半部分的人脸图像,均值聚类方案会忽略这个变化而我们的方法仍会发现这个变化。
3.3.2 Occlusion Handling
In order to deal with occlusions, we modify the constructed histogram to exclude the occluded patches when describing the target object.
为了处理遮挡,当描述目标对象时,我们修改重构直方图来排除遮挡的图像块。
The patch with large reconstruction error is regarded as occlusion and the corresponding sparse coefficient vector is set to be zero.
有大的重构误差的图像块被看作遮挡,其对定的稀疏系数向量被置为0。
Thus, a weighted histogram is generated by
where ⊙ denotes the element-wise multiplication.
因此,权重直方图又下面公式得到
⊙表示成对相乘。
Each element of o is an indicator of occlusion of the corresponding patch and is obtained by
where is the reconstruction error of patch yi, and is a predefined threshold which determines the patch is occluded or not.
o中的每个元素是对应遮挡图像块的指示器,由下面公式得到
是图像块yi的重构误差,是预定义的阈值,用来判断图像块是否被遮挡。
We thus have a sparsity-based histogram φ for each candidate.
因此对每个候选目标有一个基于稀疏性的直方图φ。
The proposed representation scheme takes spatial information of local patches and occlusion into account, thereby making it more effective and robust.
提出的表示方案考虑了局部图像块的空间信息和遮挡,因此使跟踪更有效、更鲁棒。
3.3.3 Similarity Function
We use the histogram intersection function to compute the similarity of histograms between the candidate and the template due to its effectiveness[9]by
where φc and ψ are the histograms for the c-th candidate and the template.
由于交叉函数的有效性,我们用直方图交叉函数来计算候选目标直方图和模版直方图之间的相似性,计算公式为
φc和ψ是第c个候选目标和模版的直方图。
The histogram of the template (denoted by ψ)is generated by Eqs. 5-7.
模版直方图(用ψ表示)由方程5-7得到。
The patches y in Eq. 5 are all from the first frame and the template histogram is computed only once for each image sequence.
方程5中的图像块y是由第一帧得到的,模版直方图在每个图像序列中仅计算一次。
It is updated every several frames and the update scheme is presented in Section 3.5.
每几帧更新一次,更新方案在3.5节介绍。
The vector o in Eq. 8 reflects the occlusion condition of the corresponding candidate.
方程8中的向量o体现了对应候选目标的遮挡部分。
The comparison between the candidate and the template should be carried out under the same occlusion condition, so the template and the c-th candidate share the same vector oc inEq. 7.
在候选目标和模版之间的比较应该在相同的遮挡环境下进行,因此模版和第c个候选目标共享方程7中的向量oc 。
For example, when the template is compared with the c-th candidate, the vector o of the template in Eq. 7 is set to oc.
例如,当模版与第c个候选目标比较时,方程7中的模版向量o被设为oc。
3.4. Collaborative Model
We propose a collaborative model using SDC and SGM within the particle filter framework.
我们在粒子滤波的框架下提出了一个SDC和SGM的协同模型。
In our tracking algorithm, the confidence value based on the holistic templates and the similarity function based on the local patches jointly
contribute to an effective and robust description of the probability.
在我们的跟踪算法中,基于整体模版的信任值和基于局部块的相似性函数结合起来有助于概率的有效鲁棒描述。
The likelihood function of the c-th candidate is constructed by
and the tracking result is the candidate with the highest probability.
第c个候选目标的概率函数为
跟踪结果是概率最高的候选目标。
The multiplicative formula is more effective in our tracking scheme compared with the alternative additive scheme.
在我们的跟踪算法中,与可替换的加法方案相比,这个乘法公式是更有效的。
The confidence value Hc gives higher weights to the candidates considered as positive samples (i.e., εf smaller than εb )and penalizes the others.
信任值Hc赋予被认为是正样本的候选目标更高的权重而惩罚其它的候选目标。
As a result, it can be considered as the weight of the local similarity function.
因此,它被看作局部相似性的权重。
Moreover, the confidence value of indistinguishable candidate (i.e., it can be equally constructed by positive and negative template sets when εf is almost equal to ≈ εb )is equal to 1 and it has no effect on the likelihood function when multiplying with the local similarity function.
而且,不能辨别的候选目标的信任值(例如:当εf ≈ εb时,它能由正负模版集同样的构造出来)等于1,当与局部相似函数相乘时,它对概率函数没有影响。
Consequently, in the collaborative model, the SGM module plays a more important role in object tracking.
所以,在协同模型中,SGM模块在目标跟踪中更重要。
3.5. Update Scheme
Since the appearance of an object often changes significantly during the tracking process, the update scheme is important and necessary.
由于在跟踪过程中目标外观经常变化显著,因此更新方案是重要且比不可少的。
We develop an update scheme in which the SDC and SGM are updated independently.
我们开发了在一种SDC和SGM单独更新的更新方案。
For the SDC model, we update the negative templates every several frames (5 in our experiments) from image regions away (e.g., more than 8 pixels) from the current tracking result.
在SDC模型,我们每五帧对原理当前跟踪结果的图像区域负模版进行更新。
The positive templates remain the same in the entire sequence.
在整个序列中正模版保留相同的。
As the SDC model aims at distinguishing the foreground from the background, it must make sure that the positive templates and the negative templates are all correct and distinct.
因为SDC模型目标是从背景中区分出前景,所以它必须确保正负模版是正确且不同的。
In this way, the SDC model is adaptive and discriminative.
这样,SDC模型是自适应且具有判别性的。
For the SGM model, the dictionary D is fixed for the same sequence.
对于SGM模型,字典D在同一个序列中是固定的。
Therefore, the dictionary is not deteriorated by the update of tracking failures or occlusions.
因此,随着跟踪失败和遮挡的更新字典一直是好的。
In order to capture the new appearance and recover the object from occlusions, the template histogram is updated by
where the new histogram ψn is composed of the histogram ψf at the first frame and the histogram ψl last stored according to the weights assigned by the constant μ.
为了抓住新的外观和在遮挡中恢复目标,模版直方图的更新为
新直方图ψn由根据常量μ分配权重的第一帧直方图ψf和最后存储的直方图ψl构成。
The variable On denotes the occlusion condition of the tracking result in the new frame.
变量On表示新一帧跟踪结果中的遮挡状态。
It is computed by the corresponding occlusion indication vector on (by Eq. 8)using
The update is performed as long as the occlusion condition On in this frame is smaller than a predefined constant O0.
它由对应的遮挡标识器向量计算得到
只要当前帧的遮挡状态On小于预定义常量O0就进行更新。
The update scheme preserves the first template which is usually correct and takes the newly arrived template into account.
更新方案保留了通常正确的第一个模版并考虑了新得到的模版。
4. Experimental Results
In order to evaluate the performance of our tracker, we conduct experiments on ten challenging image sequences.
为了评估跟踪器的性能,我们在十个挑战性图像序列上做了实验。
These sequences cover most challenging situations in object tracking: heavy occlusion, motion blur, in-plane and out-of-plane rotation, large illumination change, scale variation and complex background (See Figure 3).
这些图像序列包含了跟踪中的大多数有挑战性的状况:严重遮挡,运动模糊,面内旋转和面外旋转,大的光照变化,尺度变化和复杂背景。
For comparison, we run six state-of-the-art algorithms with the same initial position of the target.
作为比较,我们用相同的目标初始位置运行了六个目前的算法。 These algorithms are the Frag tracking[1], IVT tracking[21], MIL tracking[4],L-1 tracking[19],PN tracking[12] and VTD tracking[13] methods.
这些算法是基于碎片的Frag跟踪,增量视觉跟踪IVT,多实例学习跟踪MIL,L-1跟踪,PN跟踪,视觉分解跟踪VTD。
We present some representative results in this section.
在这一节我们介绍了一些有代表性的结果。
All the MATLAB source codes and datasets are available on our web sites(http://ice.dlut.edu.cn/lu/publications.html,http://faculty.ucmerced.edu/mhyang/pubs.html).
所有Matlab源码和数据集都可以在我们的网站上获得。
The parameters are presented as follows. Note that they are fixed for all sequences.
参数介绍如下。注意这些参数在所有序列上都是固定的。
The numbers of positive templates Np and negative templates Nn are 50 and 200 respectively.
正负模版数目分别为50和200。
The variableλin Eq. 1 is fixed to be 0.001.
方程1中λ的固定为0.001。
The variable λ in Eqs. 3 and 5is fixed to be 0.01.
方程3和5中的λ固定为0.01。
The row number G and column number J of dictionary D in Eq. 5 are 36 and 50.
方程5中的字典D的行数G和列数J为36和50。
The threshold ε0 in Eq. 8 is 0.04.
方程8中的阈值ε0为0.04。
The update rate μ is set to be 0.95.
更新率μ设为0.95。
The threshold O0 in Eq. 11 is 0.8.
方程11中的阈值O0 设为0.8。
4.1. Quantitative Comparison
We evaluate the above-mentioned algorithms using the center location error as well as theoverlapping rate[8], and the results are shown inTable 1 and Table 2.
我们用中心位置误差和重合率来评估上面提到的算法,测试结果如表1和表2所示。
Figure 2 shows the center location errors of the evaluated algorithms on all test sequences.
图2显示了评估的算法在所有图像序列上的中心位置误差。
Overall, the proposed tracker performs well against the other state-of-the-art algorithms.
总的来说,提出的跟踪算法比目前的其它算法要好。
4.2. Qualitative Comparison
Heavy occlusion: Occlusion is one of the most general yet crucial problems in object tracking.
严重遮挡:遮挡是目标跟踪中一个最普通但重要的问题。
In fact, several trackers including the FragTrack method[1], the MIL tracking algorithm[4], the L-1 tracking method[19]and our tracker are developed to solve this problem.
实际上,包括Frag算法,MIL跟踪算法,L-1跟踪算法和我们的跟踪算法在内的一些跟踪系统的开发都是为了解决这个问题。
In contrast, the IVT tracking method[21], the PN tracking method[12]and the VTD tracking system[13]are less effective in handling occlusions as shown in Figure 3(a), especially at frames 175, 497, 819 of the faceocc2 sequence.
与此相反,如图3(a)所示,IVT跟踪算法,PN跟踪算法和VTD跟踪系统在处理遮挡问题上不太有效,尤其是在faceocc2 序列的第175,497,819帧中。
In our SGM module, we estimate the possible occluded patches and develop a robust histogram which only compares the patches that are not occluded.
在我们的SGM模块,我们估计了可能遮挡的图像块并开发了一种仅比较非遮挡图像块的鲁棒直方图。
Thus, the occlusion handling scheme effectively alleviates the affect of occlusions.
因此,遮挡处理方案有效的减少了遮挡的影响。
Aside from tracking a target object under occlusion, our method updates appearance change correctly especially when heavy occlusions occur.
除了跟踪遮挡目标外,我们的算法能正确的更新外观变化尤其是在发生严重遮挡的情况下。
In addition, our tracker is able to deal with in-plane rotation when the target is occluded at frame 497, owing to the appearance model we employ.
此外,当目标在497帧被遮挡时,由于我们使用外观模型,我们的跟踪器能处理面内旋转。
Our tracker can accurately locate the target object at frame 819 as our generated histogram takes the spatial information of local patches into consideration.
由于我们的生成直方图考虑了局部图像块的空间信息,因此在819帧我们的跟踪器能精确定位目标。
In the caviar sequence, the target is occluded by two people at times and one of them is similar in color and shape to the target.
在caviar序列中,目标有时被两个人遮挡并且其中一个人与目标在颜色和形状上都是相似的。
The other trackers all fail before frame 134 due to heavy occlusion (Figure 3(a)).
由于严重遮挡其它的跟踪器在134帧之前都跟踪失败,如图3(a)。
Furthermore, for most template-based trackers, simple update with occluded portion often leads to drifts (frame 442 of Figure 3(a)).
此外,对于大多数基于模版的跟踪器而言,遮挡部分的简单更新经常会引起漂移问题,如图3(a)的442帧。
In contrast, our tracker achieves stable performance in the entire sequence when there is a large scale change with heavy occlusion.
与此相反,当有严重遮挡发生大尺度变化时,我们的跟踪器在整个图像序列上取得稳定的性能。
This can be attributed to our SGM model that reduces the effect of occlusions and only compares the foreground with the stored histograms.
这有助于我们的SGM模型减少遮挡的影响并用储存的直方图仅比较前景。
Besides, our update scheme doesn’t introduce heavy occlusions which may lead to drift problem.
此外,我们的更新方案不会引入会产生漂移问题的严重遮挡。
Motion blur: Fast motion of the target object or the camera leads to blurred image appearance which is difficult to account for in object tracking.
运动模糊:目标的快速运动
Figure 3(b) presents the tracking results on the animal sequence in which the appearance of the target object is almost indistinguishable due to the motion blur.
Most tracking algorithmsfail to follow the target right at the beginning of this sequence.
At frame 42, the PN tracking method[12] mistakenly locates a similar object instead of the correct target.
The reason is that the true target is blurred and it is difficult for the detector of PN[12]to distinguish it from the background.
The proposed algorithm well handles the situation with similar objects as the SDC module selects the discriminative features to better separate the target from the background.
By updating the negative templates online, the proposed algorithm successfully tracks the target object throughout the sequence.
The appearance change caused by motion blur in the jumping sequence is drastic that the Frag[1]and VTD[13] methodsfail before frame 31.
The IVT[21]method is able to track the target in some frames (e.g., frame 100)butfails when the motion blur occurs (e.g., frame 238). Our tracker successfully keeps track of the target object with small errors.
The main reason is that we use the SDC module which separates the foreground from the background.
Meanwhile, the confidence measureby Eq. 4 assigns smaller weights to the candidate of background.
Thus, the tracking result will not drift to the background.
Rotation: The girl sequence in Figure 3(c) consists of both in-plane and out-of-plane rotations.
The PN tracking method[12]and the VTD tracking method[13]fail when the girl rotates her head.
Compared with other algorithms, our tracker is more robust and accurate as seen from frame 312 and frame 430.
In our tracking scheme, the background candidates are assigned quite small weights according to Eq. 4.
Therefore, the tracking result will not shift to the background when the girl rotates (e.g., frame 111 and frame 312).
The target object in the panda sequence experiences more and larger in-plane rotations.
As seen from frame 53, the IVT method [21] fails due to occlusion and fast movement.
Most trackers drift after the target undergoes large rotations (e.g., frame 154) whereas our method performs well throughout this sequence.
As the other trackers often account for object motion with translational or similarity transforms, they are not able to deal with complex movements.
In addition, the use of local histograms helps in accounting for appearance change due to complex motion.
Furthermore, the target object in the panda sequence also undergoes occlusions as shown in frame 53 and frame 214.
The PN tracking method[12]fails to detect occlusions and track the target object after frame 214 while our tracker still performs well.
Illumination change: Figure 3(d) presents the tracking results on sequences with dramatic illumination changes.
In the singer1 sequence, the stage light changes drastically seen from frame 121 and frame 321.
The PN tracking method[12]is not able to detect and track the target object (e.g., frame 121).
On the other hand, our tracker accurately locates the target object even when there is a large scale change at frame 321.
In the shaking sequence, the target object undergoes large appearance variation due to drastic illumination change and unpredictable motion.
Our SDC module introduces the backgrounds and the images with parts of the target as negative templates so the confidence values of these candidates calculated by Eq .4 are small.
Thus, the tracking result is accurately located on the true target without much offset.
For the car11 sequence, there is low contrast between the foreground and the background (frame 284) as well as illumination change.
The FragTrack method[1]fails at the beginning (at frame 19) because it only uses the local information and does not maintain a holistic representation of the target.
The IVT tracking method[21]achieves good results in this sequence. It can be attributed to the fact that subspace learning method is robust to illumination changes.
In our SDC module, we select several discriminative features which can better separate the target from the background.
Thus, our tracker performs well in spite of the low contrast between the foreground and the background.
Complex background: The board sequence is challenging as the background is cluttered and the target object experiences out-of-plane rotations as seen from Figure 3(e).
In frame 55, most trackers fail as holistic representations inevitably include background pixels that may be considered as part of foreground object through straightforward update schemes.
Using fixed templates, the FragTrack method[1] is able to track the target as long as there is no drastic appearance change (e.g., frame 55 and frame 183), but fails when the target moves quickly or rotates (e.g., frame 78, frame 395 and frame 528).
Our tracker performs well in this sequence as the target can be differentiated from the cluttered background with the use of our SDC module.
In addition, the update scheme uses the newly arrived negative templates that facilitate separation of the foreground object and the background.
5. Conclusion
In this paper, we propose and demonstrate an effective and robust tracking method based on the collaboration of generative and discriminative modules.
In our tracker, holistic templates are incorporated to construct a discriminative classifier that can effectively deal with cluttered and complex background.
Local representations are adopted to form a robust histogram that considers the spatial information among local patches with an occlusion handling module, which enables our tracker to better handle heavy occlusion.
The contributions of these holistic discriminative and local generative modules are integrated in a unified manner.
Moreover, the online update scheme reduces drifts and enhances the proposed method to adaptively account for appearance change in dynamic scenes.
Quantitative and qualitative comparisons with six state-of-the-art algorithms on ten challenging image sequences demonstrate the robustness of our tracker.
Acknowledgements
W. Zhong and H. Lu are supported by the National Natural ScienceFoundation of China #61071209. M.-H.Yang is supported by the NSF CAREER Grant #1149783 and NSF IIS Grant #1152576.
Reference
[1] A. Adam, E. Rivlin, and I. Shimshoni. Robust fragments-based tracking using the integral histogram. In CVPR, 2006.
[2] S.Avidan. Supportvector tracking. PAMI, 26(8):1064–1072, 2004.
[3] S.Avidan. Ensemble tracking. PAMI, 29(2):261–271, 2007.
[4] B. Babenko, M.-H. Yang, and S. Belongie. Visual tracking with on-line multiple instance learning. In CVPR, 2009.
[5] O. Boiman, E. Shechtman, and M. Irani. In defense of nearest-
neighbor based image classification. In CVPR, 2008.
[6] D. Comaniciu, V. R. Member, and P. Meer. Kernel-based object
tracking. PAMI, 25(5):564–575, 2003.
[7] T. B. Dinh and G. G. Medioni. Co-training framework of generative and discriminative trackers with partial occlusion handling. In Proceedings of IEEEWorkshop on Applications of ComputerVision, pages 642–649, 2011.
[8] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and
A. Zisserman. The PASCAL Visual Object Classes Challenge 2010
(VOC2010) Results, 2010.
[9] S. Gao, I. W.-H. Tsang, L.-T. Chia, and P. Zhao. Local features are not lonely -laplacian sparse coding for image classification. In CVPR, 2010.
[10] H. Grabner and H. Bischof. On-line boosting and vision. In CVPR, 2006.
[11] H. Grabner, C. Leistner, and H. Bischof. Semi-supervised on-line boosting for robust tracking. In ECCV, 2008.
[12] Z. Kalal, J. Matas, and K. Mikolajczyk. P-N learning: Bootstrapping binary classifiersby structural constraints. In CVPR, 2010.
[13] J. Kwon and K. M. Lee. Visual tracking decomposition. In CVPR, 2010.
[14] Y. Li, H. Ai, T. Yamashita, S. Lao, and M. Kawade. Tracking in low frame rate video: a cascade particle filter with discriminative observers of different life spans. PAMI, 30(10):1728–1740, 2008.
[15] B. Liu, J. Huang, L.Yang, and C.Kulikowsk. Robust tracking using local sparse appearance model and k-selection. In CVPR, 2011.
[16] B. Liu, L. Yang, J. Huang, P. Meer, L. Gong, and C. Kulikowski. Robust and fast collaborative tracking with two stage sparse optimization. In ECCV, 2010.
[17] R. Liu,J. Cheng, andH. Lu.Arobust boostingtracker with minimum error bound in a co-training framework. In ICCV, 2009.
[18] H.Lu,Q.Zhou,D.Wang,andX. Ruan.Aco-training frameworkfor
visual tracking with multiple instance learning. In FG, 2011.
[19] X. Mei and H. Ling. Robust visual tracking using L-1
minimization. In ICCV, 2009.
[20] P.P′Color-based probaerez, C. Hue, J. Vermaak, and M. Gangnet. bilistic tracking. In ECCV, 2002.
[21] D. Ross, J. Lim, R.-S. Lin, and M.-H.Yang. Incremental learning for robust visual tracking. IJCV, 77(1-3):125–141, 2008.
[22] J. Santner, C. Leistner, A. Saffari, T. Pock, and H. Bischof. PROST: Parallel robust online simple tracking. In CVPR, 2010.
[23] F. Tang, S. Brennan, Q. Zhao, and H. Tao. Co-tracking using semi-supervised support vector machines. In ICCV, 2007.
[24] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. Locality-constrained linear coding for image classification. In CVPR, 2010.
[25] S. Wang, H. Lu, F. Yang, and M.-H. Yang. Superpixel tracking. In ICCV, 2011.
[26] J. Wright, Y. Ma, J. Maral, G. Sapiro, T. Huang, and S. Yan. Sparse representation for computer vision and pattern recognition. Proceedings of the IEEE, 98(6):1031–1044, 2010.
[27] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma. Robust face recognition via sparse representation. PAMI, 31(2):210–227, 2009.
[28] F. Yang, H. Lu, and Y.-W. Chen. Bag of features tracking. In ICPR, 2010.
[29] J. Yang, J. Wright, T. S. Huang, and Y. Ma. Image super-resolution via sparse representation. TIP, 19(11):2861–2873, 2010.
[30] J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid
matching using sparse coding for image classification. In CVPR,
2009.
[31] Q. Yu, T. B. Dinh, and G. G. Medioni. Online tracking and reacquisition using co-trained generative and discriminative trackers. In ECCV, 2008.