2018-ECCV-Mancs-A Multi-task Attentional Network with Curriculum Sampling

论文地址

Motivation

  • 现有的Re-ID工作都面临以下的问题:

    • loss function的选择
    • 不对准问题
    • 寻找高判别力的局部特征
    • 对于rank loss优化中的采样问题
  • 目前的大多数工作都是针对上述问题中的一两个来进行解决,能不能用一个统一的框架来解决上述问题呢?

Contribution

  • 提出了Mancs框架来统一解决上述问题
  • 提出了fully attentional block with deep supervision与curriculum sampling来提高模型提取特征的能力与训练的效果(这两个可以借鉴到其他工作上)
  • 本文提出的方法在三个公开数据集上达到了SOTA效果

1 Introduction

  • Re-ID定义、意义以及难点

  • 研究方向:

    • 行人特征表示
    • 距离度量:存在正负样本不平衡问题,通常对采样方法要求较高
  • 动机与贡献


2 Related Work

  • Attention Network

    • MSCAN
    • HA-CNN
    • CAN
  • Metric Learning

    • triplet loss ==> online hard examples mining(OHEM)
    • contrastive loss
  • Multi-task learning

    • triplet loss + softmax
    • 本文:triplet loss + focal loss

3 Method

3.1 Training Architecture

  • 如下图,本文的网络结构主要由三部分构成:
    • backbone network (ResNet50) ==> a multi-scale feature extractor
    • attention module ==> attention mask
    • loss function:attention loss + triplet loss + focal loss
2018-ECCV-Mancs-A Multi-task Attentional Network with Curriculum Sampling_第1张图片

3.2 Fully Attentional Block

2018-ECCV-Mancs-A Multi-task Attentional Network with Curriculum Sampling_第2张图片
  • 借鉴了SE Block,对其结构进行了改进:

    • SE Block的问题:使用GAP导致空间结构信息的丢失 ==> 本文去掉池化层,用1x1的卷积层来代替全连接层来保留空间信息
  • attention map计算公式:
    M = S i g m o i d ( C o n v ( R e L U ( C o n v ( F i ) ) ) ) M = Sigmoid(Conv(ReLU(Conv(F_i)))) M=Sigmoid(Conv(ReLU(Conv(Fi))))

  • 由attention map得到输出feature map
    F o = F i ∗ M + F i F_o = F_i * M + F_i Fo=FiM+Fi

3.3 ReID Task #1: Triplet loss with curriculum sampling

  • ranking loss相比classification loss在数据量不大的时候有更强的泛化性能

  • rank branch:共享backbone + a pooling layer + FC layer

  • 采样方法:OHEM每个选择最困难的样本进行参数更新容易导致训练过程中模型坍塌 ==> curriculum sampling(from easy triplets to hard triplets)

    • 对于一个anchor I i a I_i^a Iia,首先随机选择一个positive I i p I_i^p Iip
    • 根据负样本到anchor的距离从小到大(hard --> easy)进行排序
    • 根据概率分布(Gaussian distribution N ( μ , σ ) \mathcal{N}(\mu, \sigma) N(μ,σ))来对负样本进行选择

μ = [ N n − N n t 0 t ] + σ = a × b t − t 0 t 1 − t 0 \mu = [N_n - \frac{N_n}{t_0}t]_+ \\ \sigma = a \times b^{\frac{t-t_0}{t_1 - t_0}}\\ μ=[Nnt0Nnt]+σ=a×bt1t0tt0

  • I i n I_i^n Iin的选择概率, 随着 t t t增大,选择困难样本的概率增大,如下图
    P r ( I i n ∗ = I i n ∣ I i a ) ∝ N ( μ , σ ) Pr(I^{n^*}_i=I_i^n|I^a_i) \propto \mathcal{N}(\mu, \sigma) Pr(Iin=IinIia)N(μ,σ)
  • final loss for ranking branch

L r a n k = 1 P ( K − 1 ) K ∑ i = 1 P ( K − 1 ) K [ m + D ( f r a n k ( I i a ) , f r a n k ( I i n ) ) ] + L_{rank} = \frac{1}{P(K-1)K} \sum\limits_{i=1}^{P(K-1)K}[m+D(f_{rank}(I^a_i),f_{rank}(I^n_i))]_+ Lrank=P(K1)K1i=1P(K1)K[m+D(frank(Iia),frank(Iin))]+

3.4 ReID Task #2: Person classification with focal loss

  • 考虑到classification + ranking效果更好,添加了classification branch,同时考虑到困难样本应该比简单样本更受重视,选择了focal loss(softmax loss的一种改进版本),给困难样本更多的权重

  • focal loss for classification branch
    L c l s = − 1 P K ∑ i = 1 P K ( 1 − p i ) γ l o g ( p i ) p i = S i g m o i d c i ( F C ( f c l s ( I i ) ) ) L_{cls} = -\frac{1}{PK}\sum \limits_{i=1}^{PK}(1-p_i)^\gamma log(p_i) \\ p_i = Sigmoid_{c_i}(FC(f_{cls}(I_i))) Lcls=PK1i=1PK(1pi)γlog(pi)pi=Sigmoidci(FC(fcls(Ii)))

3.5 ReID Task #3: Deep supervision for better attention

  • 将不同尺度得到的attention map(与attention mask相乘过的特征图)进行平均池化与concatated得到attention feature vector f a t t f_{att} fatt进行来身份分类 ==> accurate attention maps

  • loss function for attention branch
    L a t t = 1 P K C ∑ i = 1 P K ∑ c = 1 C y i c l o g ( q i c ) + ( 1 − y i c ) l o g ( 1 − q i c ) q i c = S i g m o i d c ( F C ( f a t t ( I i ) ) ) L_{att} = \frac{1}{PKC}\sum \limits_{i = 1}^{PK}\sum \limits_{c=1}^Cy_i^clog(q^c_i) + (1-y_i^c)log(1-q^c_i) \\ q^c_i = Sigmoid_c(FC(f_{att}(I_i))) Latt=PKC1i=1PKc=1Cyiclog(qic)+(1yic)log(1qic)qic=Sigmoidc(FC(fatt(Ii)))

3.6 Multi-task learning

  • three tasks(rank + cls + att)共享backbone,最终的loss function:
    L = λ r a n k L r a n k + λ c l s L c l s + λ a t t L a t t \mathcal{L}= \lambda_{rank}L_{rank} + \lambda_{cls}L_{cls} + \lambda_{att}L_{att} L=λrankLrank+λclsLcls+λattLatt

3.7 Inference

  • rank branch的特征具有更强的泛化性能,在测试阶段用来代表行人图片,如下图所示
2018-ECCV-Mancs-A Multi-task Attentional Network with Curriculum Sampling_第3张图片

4 Experiments

4.1 Datasets

  • Market1501、CUHK03、DukeMTMC-reID

4.2 Evaluation Protocol

  • mAP、CMC

  • Market1501:both single query and multi-query;CUHK03与DukeMTMC-reID:single query

  • CUHK03 split:1367/100 and 767/700

4.3 Implementation Details

  • Pytorch

  • Pretrained ResNet-50 + 分类层前的2048FC

Data Augmengtation

  • resize images to 256 x 128 ==> randomly crop with scale in [0.64, 1.0] and aspect ratio in [2, 3] ==> resize back to 256 x 128 ==> randomly horizontally flip with probility 0.5 ==> random erasing ==> subtracted the mean value and divided by the standard deviation

Training Configurations

  • PK Sampling strategy:Market1501 and DukeMTMC-ReID:P、K = 16 CUHK03:P=32,K=8 DukeMTMC-ReID

  • 160 epochs、 t 0 = 30   t 1 = 60 a = 15 b = 0.001 t_0=30 \ t_1=60 a=15 b=0.001 t0=30 t1=60a=15b=0.001

  • λ r a n k = 1 , λ c l s = 1 , λ a t t = 0.2 \lambda_{rank}=1,\lambda_{cls}=1,\lambda_{att}=0.2 λrank=1,λcls=1,λatt=0.2

  • m a r g i n   m = 0.5   γ = 2 margin \ m=0.5 \ \gamma=2 margin m=0.5 γ=2

  • Adam optimizer, lr=3x10e-4

  • gradient clipping to prevent model collision

  • 最后卷积层的ReLU换成了PReLU ==> 增强最后的特征的表达能力

4.4 Comparisons with the state-of-art methods

Evaluation On Market-1501

2018-ECCV-Mancs-A Multi-task Attentional Network with Curriculum Sampling_第4张图片

Evaluation On CUHK03

2018-ECCV-Mancs-A Multi-task Attentional Network with Curriculum Sampling_第5张图片
2018-ECCV-Mancs-A Multi-task Attentional Network with Curriculum Sampling_第6张图片

Evaluation On DukeMTMC-reID

2018-ECCV-Mancs-A Multi-task Attentional Network with Curriculum Sampling_第7张图片

4.5 Ablation Study

  • 对本文提出的Curriculum Sampling(CS)、Full Attentional Block、Focal Loss、Random Erasing有效性进行了验证,如下表
2018-ECCV-Mancs-A Multi-task Attentional Network with Curriculum Sampling_第8张图片
  • cls + rank的baseline已经很高了,本文提出的方法每个提升相对比较小

  • 下图举的例子不是很懂,文中该图说明random erasing与cls有很大的提升

2018-ECCV-Mancs-A Multi-task Attentional Network with Curriculum Sampling_第9张图片

5 Conclusions

  • 本文提出的Mancs能够学习稳定的特征在三个常用的公开数据集上取得了SOTA的性能
  • 本文提出的fully attentional block with deep supervision与curriculum sampling的有效性(可以在其他相关任务借鉴)
  • 未来工作:结合数据采样与增强进一步提供reID特征的泛化能力

你可能感兴趣的:(论文笔记,行人重识别)