This pa-per focuses on cross-camera label estimation, which can be subsequently used in feature learning to learn robust re-IDmodels.
Specifically, we propose to construct a graph forsamples in each camera, and then graph matching schemeis introduced for cross-camera labeling association.
this paper propose a dynamic graphmatching (DGM) method.
DGM iteratively updates theimage graph and the label estimation process by learninga better feature space with intermediate estimated labels.
DGM is advantageous in two aspects:
the accuracy of es-timated labels is improved significantly with the iterations;
DGM is robust to noisy initial training data.
Code is available at
Different from theexisting unsupervised person re-ID methods, this paper isbased on a more customized solution, i.e., cross-cameralabel estimation. In other words, we aim to mine the la-bels (matched or unmatched video pairs) across cameras.With the estimated labels, the remaining steps are exactlythe same with supervised learning.
In light of the above discussions, this paper proposes adynamic graph matching (DGM) method to improve thelabel estimation performance for unsupervised video re-ID(the main idea is shown in Fig. 1). Specifically, our pipelineis an iterative process. In each iteration, a bipartite graph isestablished, labels are then estimated, and then a discrim-inative metric is learnt. Throughout this procedure, labelsgradually become more accurate, and the learnt metric morediscriminative. Additionally, our method includes a labelre-weighting strategy which provides soft labels instead ofhard labels, a beneficial step against the noisy intermediatelabel estimation output from graph matching.
the main contributions are summarized as follows:
We propose a dynamic graph matching (DGM) methodto estimate cross-camera labels for unsupervised re-ID, which is robust to distractors and noisy initial train-ing data.
Our experiment confirms that DGM is only slightly in-ferior to its supervised baselines and yields competi-tive re-ID accuracy compared with existing unsuper-vised re-ID methods on three video benchmarks.
这张图告诉了我们,video based和image based的区别。video 对应多张图,而image对应一张图.
However, there still remains two obvi-ous shortcomings:
we need to learna discriminative feature space to optimize the graphmatching results.
Therefore, it is reasonable to re-encode theweights of labels for overall learning, especially forthe uncertain estimated positive video pairs.
The label re-weighting scheme has the following advan-tages:
(1) for positive video pairs, it could filter some falsepositives and then assign different positive sample pairs dif-ferent weights;
(2) for negative video pairs, a number ofeasy negatives would be filtered. The re-weighing schemeis simple but effective as shown in the experiments.
The proposed approach is summarized in Algorithm 1.
