【论文】【CVPR 2019 oral】Unsupervised Person Re-identification by Soft Multilabel Learning

目录

Abstract

Introduction

Related Work

Deep Soft Multilabel Reference Learning

Problem formulation and overview

Soft multilabel-guided hard negative mining

Cross-view consistent soft multilable learning

Reference agent learning

Model training and testing

Experiments

Datasets

Comparison

Ablation study


Abstract

为解决RE-ID (re-identification)中的扩展性问题,提出了 a deep model for the soft multilabel learning.

idea: 将unlabeled person和辅助域中的reference person比较从而学到一个soft multilabel(真值标签的似然向量)

  • 提出了soft multilabel-guided hard negative mining来学习discriminative embedding
  • 引入reference agent learning来表示每一个reference person

效果:在Market-1501和DukeMTMC-reID数据上,是无监督方法的第一

Introduction

  • 为解决无监督RE-ID中缺少pairwise label guidance的问题:
    • 提出了soft multilabel learning来挖掘潜在的标签信息
    • main idea:
      • 通过将该unlabeled person和辅助集中的reference persons集合进行比较,得到soft multilabel(soft multilabel 就是实值标签的似然向量)
      • 【论文】【CVPR 2019 oral】Unsupervised Person Re-identification by Soft Multilabel Learning_第1张图片 
      • 箭头越粗,表示概率越大。得到的是soft multilabel
  • 基于soft multilabel,提出了soft-multilabel-guided hard negative mining来挖掘潜在的差别信息,即利用soft multilabel区分视觉上相似实则不同的unlabeled person
    • 什么是hard negative?
      • 一组unlabeled person如果视觉上相似但是比较性的特征却不相似,则这组unlabeled person 就是 hard negative pair
  • cross-view consistent learning
  • reference agent learning
  • summarize the contribution:
    • (1)通过soft multilabel解决了无监督RE-ID问题
    • (2)提出了一个统一的模型,称为soft multilabel reference learning (MAR), 可以同时处理soft multilabel-guided hard negative mining, cross-view consistent soft multilabeled learning, reference agent learning

Related Work

  • Unsupervided RE-ID
    • Unsupervised RE-ID指的是目标数据集无标签,但是辅助数据集不一定没有标签
    • 最近的一项unsupervised RE-ID的工作是pseudo learning伪标签学习。改论文与伪标签学习的区别在于, soft multilabel可以学到视觉特征相似性意外的辅助参考信息,而伪标签只能编码视觉特征相似性,因此soft multilabel可以挖掘不能直接由视觉特征比较而来的潜在标签信息。
  • Unsupervised domain adaptation
    • 该方法是通过调整source domain和target domain之间的分布实现的。
    • unsupervised domain adpatation 存在的问题是它是建立在source domain和target domain类别数量都一致的基础上的,而实际的RE-ID问题是不一致的。
  • Multilabel classification
    • 这里的multilabel和作者的soft multilabel是不一样的
    • multilabel是groundtruth的二值向量
    • 作者的soft multilabel是真值标签的似然向量
  • Zero-shot learning
    • zero-shot learning是通过语义属性来识别的
    • soft multilabel reference learning 和 zero-shot learning的相似性在于两者都是通过一致的reference person集合来描述一个未知的target person
    • zero-shot learning的问题在于需要预先定义语义属性,这在unsupervised RE-ID问题上很难实现

Deep Soft Multilabel Reference Learning

Problem formulation and overview

  • 目标RE-ID数据集
    • \mathcal{X}=\{x_i\}_{i=1}^{N_u}, \textup{ where } x_i \textup{ is an unlabeled person}
  • 辅助RE-ID数据集
    • \mathcal{Z}=\{z_i, w_i\}_{i=1}^{N_a}, \textup{ where }z_i \textup{ is a person image with its label } w_i=1, \cdots, N_p \textup{ where } N_p \textup{ is the number of the reference persons}
  • 目标1: 学到一个soft multilabel function l(\cdot)使得y=l(x,\mathcal{Z})\in(0,1)^{N_p}
    • y所有维的和为1,每一维代表似然度
  • 目标2:学习一个discriminative deep feature embedding f(\cdot)
  • 目标3:学习reference agents集合\{a_i\}_{i=1}^{N_p}
    • 每一个a_i表示shared joint feature embedding中的reference person,
    • shared joint feature embedding包含unlabeled person f(x)以及agents\{a_i\}_{i=1}^{N_p}
  • 学习soft multilabel的方法:
    • 比较f(x)和agents\{a_i\}_{i=1}^{N_p}
    • 即,soft multilabel function简化为:y=l(f(x),{a_i}_{i=1}^{N_p})
  • overall illustration
    • 【论文】【CVPR 2019 oral】Unsupervised Person Re-identification by Soft Multilabel Learning_第2张图片
    • 【论文】【CVPR 2019 oral】Unsupervised Person Re-identification by Soft Multilabel Learning_第3张图片
    • 红色圆圈:unlabeled person image f(x)
    • 蓝色三角:a set of reference agents \{a_i\}_{i=1}^{N_p}

Soft multilabel-guided hard negative mining

  • soft multilabel function:
    • y^{(k)}=l(f(x),\{a_i\}_{i=1}^{N_p})^{(k)}=\frac{\exp(a_k^Tf(x))}{\sum_i\exp(a_i^Tf(x))}
    • 用内积来表示相似度
  • 假设1
    • 如果一对unlabeled person images x_i, \ x_j有较高的特征相似性f(x_i)^Tf(x_j),称这对为similar pair。如果similar pair有较高的比较特征相似性,称该pair可能是positive pair,否则可能是hard negative pair
  • 为衡量比较特征的相似性,soft multilabel agreeement A(\cdot, \cdot)定义为:
    • A(y_i, y_j)=y_i \land y_j=\sum\nolimits_{k} \min(y_i^{(k)}, y_j^{(k)})=1-\frac{\left \| y_i-y_j \right \|_1}{2}
  • 通过同时考虑从特征相似度和soft multilabel agreement来挖掘hard negative pairs
    • 给定挖掘概率p
    • 定义假设1中的相似对为pM对,M为目标数据集中的所有对数(M=N_u\times (N_u-1)/2
    • 定义postive set \mathcal{P}和hard negative set \mathcal{N}​​​​​​​为:
      • \mathcal{P}=\{(i, j) \mid f(x_i)^Tf(x_j)\geqslant S, A(y_i,y_j)\geqslant T \}
      • \mathcal{N}=\{(k,l) \mid f(x_k)^Tf(x_l) \geqslant S, A(y_k, y_l)< T\}
      • S, T是两个阈值
  • soft multilabel-guided discriminative embedding learning定义为:
    • L_{MDL}=-\log\frac{\overline{P}}{\overline{P}+\overline{N}}
    • \overline{P}=\frac{1}{|p|}\sum \nolimits_{(i,j)\in \mathcal{P}}\exp(-\left \| f(z_i)-f(z_j) \right \|_2^2)
    • \overline{N}=\frac{1}{|N|}\sum \nolimits_{(k, l)\in \mathcal{N}}\exp(-\left \| f(z_k)-f(z_l) \right \|_2^2)

Cross-view consistent soft multilable learning

  • cross-view consistent soft Multilabel Learning loss:
    • L_{CML}=\sum\nolimits_vd(\mathbb{P}_v(y),\mathbb{P}(y))^2
    • \mathbb{P}(y)是数据集\mathcal{X}的soft multilabel分布
    • \mathbb{P}_v(y)是数据集\mathcal{X}中第v个摄像头视角的soft multilabel分布
    • d(\cdot, \cdot)是两个分布之间的距离(本文使用的是simplified 2-Wasserstein distance),在该距离下,loss定义为
      • \mu / \sigma是log-soft multilabel的mean/std vector
      • \mu_v / \sigma_v是log-soft multilabel第v相机的mean/std vector

Reference agent learning

  • reference agent既要相互区分,又要能够表示所有对应person的图片
  • agent learning loss:
    • L_{AL}=\sum \nolimits_k-\log l(f(z_k), \{ a_i \} )^{(w_k)}=\sum\nolimits_k -\log\frac{\exp(a_{w_k}^Tf(z_k))}{\sum_j\exp(a_j^Tf(z_k))}
    • z_k是辅助集中第k个person的图片,label为w_k
  • 上式只是针对辅助数据集,为了进一步提高soft multilabel function在unlabeled target dataset上的有效性,提出了joint embedding learning for reference comparability
    • 为什么要有joint embedding learning?
      • 获得reference comparability的最大挑战在于domain shift,这是由两个独立domain中不同person的外观分布引起的
    • reference agent-based joint embedding learning loss:
      • L_{RL}=\sum\nolimits_{i}\sum\nolimits_{j\in\mathcal{M}_i}\sum \nolimits_{k:w_k=i}[m-\left \| a_i-f(x_j) \right \|_2^2]_++\left \| a_i-f(z_k) \right \|_2^2
  • reference agent learning loss:
    • L_{RAL}=L_{AL}+\beta L_{RJ}

Model training and testing

  • soft multilabel reference learning (MAR) loss:
    • L_{MAR}=L_{MDL}+\lambda_1L_{CML}+\lambda_2L_{RAL}
  • train model end to end by SGD
  • for testing, compute the cosine feature similarity

Experiments

Datasets

  • Evaluation benchmarks
    • Market-1501
    • SukeMTMC-reID
  • Auxiliary dataset
    • MSMT17

Comparison

【论文】【CVPR 2019 oral】Unsupervised Person Re-identification by Soft Multilabel Learning_第4张图片

Ablation study

【论文】【CVPR 2019 oral】Unsupervised Person Re-identification by Soft Multilabel Learning_第5张图片

 

CONTACT INFORMATION

E-Mail: [email protected]

QQ: 46611253

 

 

你可能感兴趣的:(Paper,Machine,Learning)