读论文之--A Discriminatively Learned CNN Embedding for Person Re-identification

arXiv:1611.05666v2 [cs.CV] 3 Feb 2017 by Zhedong Zheng, Liang Zheng and Yi Yang


Dataset

Market1501,CUHK03,Oxford5k.

Verification-Identification Models

结合Verification model和Identification model的优点,并通过互补来规避两者各自的缺点。

Verification models:把Person Re-id当作一个二分类的识别任务或者说是相似性回归任务,以图片对作为输入并判断他们是否为同一个人。

缺点:只使用弱Reid标签,而没有考虑图片对(image pair)与其他图片之间的关系。

Identification models:为了充分利用Re-id标签,identification models 把行人重识别当作一个多分类的识别任务, 用以特征学习.

缺点:The major drawback of the identification model is that the training objective is different from the testing procedure,it does not account for the similarity measurement between image pairs, which can be problematic during the pedestrian retrieval process.

读论文之--A Discriminatively Learned CNN Embedding for Person Re-identification_第1张图片
a) Identification models treat person re-ID as a multi-class recognition task, which take one image as input and predict its identity. b) Verification models treat person re-ID as a two-class recognition task or a similarity regression task, which take a pair of images as input and determine whether they belong to the same person or not.

因为以上两种模型各自的优点与限制,提出了Siamese Network,结合了两者的优点,并弥补相互的不足。它能同时预测人的id和判断两人的相似性。

读论文之--A Discriminatively Learned CNN Embedding for Person Re-identification_第2张图片

读论文之--A Discriminatively Learned CNN Embedding for Person Re-identification_第3张图片
activation model.png

论文核心模型 --Siamese Network

读论文之--A Discriminatively Learned CNN Embedding for Person Re-identification_第4张图片
Siamese Network
  • 给定227x227的一个图片对,网络同时给出两张图片的ID和similarity score。
  • 该网络包括两个pre-trained CNN models(此处为CaffeNet),3个additional Convolutional Layers,一个Square Layer和3个loss(2个identification loss和1个verification loss)。
  • 本文中pre-trained CNN models的最后的FC层(1000-dim)被去掉了,被替换为卷积层(In order to fine-tune the network on a new dataset),并且加入softmax来约束输出。
  • 卷积过程中没有假如ReLU,和大多数方法一样,此处使用的是cross-entropy loss。
  • 在此模型中,直接比较high-level features f1,f2作为similarity评估。
  • Square Layer,无参层,用以比较f1,f2的特征,它将两个tensor作为输入,将它们做差的平方后输出一个tensor。fs=f(f1-f2)^2.(f1,f2 are the 4,096-dim embeddings and fs is the output tensor of the Square Layer. )

你可能感兴趣的:(读论文之--A Discriminatively Learned CNN Embedding for Person Re-identification)