[Paper note] MARS: A Video Benchmark for Large-Scale Person Re-identification

  • paper
  • dataset
  • code for evaluation in Matlab
  • Authors: Zheng, Liang and Bie, Zhi and Sun, Yifan and Wang, Jingdong and Su, Chi and Wang, Shengjin and Tian, Qi
  • Su, Chi and Liang, Zhang have worked on re-identification problem quite a time.

Properties

  • Tracklets: automatically generated by DPM pedestrain detector and GMMCP tracker.
  • One identity in probe has multiple (averagely about 10 for test set) GT in gallery, mAP is more reasonable evaluation metric.
  • 1261 identities, around 20,000 video sequences.
  • Statistics
    • [Paper note] MARS: A Video Benchmark for Large-Scale Person Re-identification_第1张图片
  • Train/test set
    • 631/630 train/test identities.
    • Test set has 2009 queries?? (1980 in the provided query_info.mat file)

Benchmark methods

  • Traditional features: HOG3D, GEI
  • Metric learning: XQDA, Kissme
  • CNN: CaffeNet + ImageNet pre-training
  • For CNN, metric learning is also applied

Experiment

  • Other datasets: PRID-2011, iLIDS-VID
    • Network is trained on MARS and fine-tuned on these datasets
  • Four modes: v-v, v-i, i-v, i-i (i->image, v->video), v-v yields best result
  • Evaluation of motion feature: all really low
  • Evaluation of CNN features
    • Train from scratch or pre-training on ImageNet: pre-training brings +9.5% improvement
    • Use metric learning or not
      • Trained on MARS and directly transfer without training on other datasets, Euclidean is low, metric learning improves performances a lot
      • CNN trained model can still benefit from metric learning
    • Transfer from MARS
      • Transfer from MARS > only pre-training on ImageNet on PRID-2011
      • Transfer from MARS < only pre-training on ImageNet on iLIDS!!!
      • iLIDS-VID has different scene compared with MARS and PRID
    • Max pooing or mean pooling
      • Max pooling is generally better on MARS and PRID, while average pooling is better on iLIDS-VID (but from numeric result, average pooling is better when using Euclidean distance)
    • Multiple queries: max pooling different tracklets within the same camera, further improve the performance (why???)
  • New state-of-the-art
    • PRID-2011: 77.3% rank-1 with CNN MARS transfered and XQDA
    • iLIDS-VID: 53.0% with ImageNet transfered and XQDA
    • MARS: 68.3%, 82.6%, 89.4% rank 1, 5, 20 respectively, 49.3% mAP with CNN + Kissme + MultipleQueries

你可能感兴趣的:(paper-note)