[FGD] Focal and Global Knowledge Distillation for detectors (CVPR. 2022)

[FGD] Focal and Global Knowledge Distillation for detectors (CVPR. 2022)_第1张图片

1. Motivation

本文作者指出,在目标检测中,tea和stu之间的特征在不同的区域例如前后景的差别是比较大的。

  • In this paper, we point out that in object detection, the features of the teacher and student vary greatly in different areas, especially in the fore- ground and background.

如果用同样的方法蒸馏,那么在特征图上不均匀的差异性会导致蒸馏的效果更差。

  • If we distill them equally, the uneven differences between feature maps will negatively af- fect the distillation.

因此本文提出了FGD,分为了focal distillation 和 global distillation。

  • Thus, we propose Focal and Global Distillation (FGD).
  • Focal distillation separates the fore- ground and background, forcing the student to focus on the teacher’s critical pixels and channels.
  • Global distilla- tion rebuilds the relation between different pixels and trans- fers it from teachers to students, compensating for missing global information in focal distillation.

从图1可以得出, 学生网络对于 前景的attention map 比背景的响应是更大的。 这就说明了 蒸馏也是存在着 前后景不平衡的影响。

[FGD] Focal and Global Knowledge Distillation for detectors (CVPR. 2022)_第2张图片

从表1可以得出,作者在采用解耦fg 和bg的特征时,得到的蒸馏效果确实最差的(38.9),因此作者构思了focal dis 来获取关键的pixels 和 channels, 同时使用gcblock 提出 全局特征。

[FGD] Focal and Global Knowledge Distillation for detectors (CVPR. 2022)_第3张图片

本文对于全局特征提取使用的GC Block。

[FGD] Focal and Global Knowledge Distillation for detectors (CVPR. 2022)_第4张图片

2. Contribution

  • We present that the pixels and channels that teacher and student pay attention to are quite different. If we distill the pixels and channels without distinguishing them, it will result in a trivial improvement.

  • We propose focal and global distillation, which en- ables the student not only to focus on the teacher’s crit- ical pixels and channels, but also to learn the relation between pixels

  • We verify the effectiveness of our method on various detectors via extensive experiments on the COCO [21], including one-stage, two-stage, anchor-free methods, achieving state-of-the-art performance.

3. Method

[FGD] Focal and Global Knowledge Distillation for detectors (CVPR. 2022)_第5张图片

作者首先引出了一个例子,在普通的蒸馏特征的公式如下所示:

[FGD] Focal and Global Knowledge Distillation for detectors (CVPR. 2022)_第6张图片

其中小f是一个adaptation layer 来reshape Ft 和 Fs之间的维度。

但这种方法是对于所有部分同等蒸馏, 缺乏全局之间的联系。

  • However, such methods treat all the parts equally and lack the distillation of the global relations between different pixels.

3.1. Focal Distillation

首先使用一个maks 来区分前后景。 前景为1,背景为0

[FGD] Focal and Global Knowledge Distillation for detectors (CVPR. 2022)_第7张图片

进一步的,为了同等对待 小物体 和 大物体之间gt 的面积,以及前后景的比例,作者提出了一个sacle mask:

[FGD] Focal and Global Knowledge Distillation for detectors (CVPR. 2022)_第8张图片

  • If a pixel belongs to different targets, we choose the smallest box to calculate the S (额外的限制)

[FGD] Focal and Global Knowledge Distillation for detectors (CVPR. 2022)_第9张图片

空间和通道的特征如下:

Gs 可以理解为 HxWx1, Gc可以理解为 1 x 1 x C的attention map

[FGD] Focal and Global Knowledge Distillation for detectors (CVPR. 2022)_第10张图片

[FGD] Focal and Global Knowledge Distillation for detectors (CVPR. 2022)_第11张图片

因此, attention mask可以被定义为:

在这里插入图片描述
在这里插入图片描述

feature loss定义为: 其中2项分别是对bg 和fg计算, 通过2个超参数平衡稀疏,并且 A S A^S AS以及 A C A^C AC在训练过程中都是使用teahcer模型的。

[FGD] Focal and Global Knowledge Distillation for detectors (CVPR. 2022)_第12张图片

  • Attention loss:
  • Besides, we use attention loss Lat to force the student detector to mimic the spatial and channel attention mask of the teacher detector(L1 loss)

在这里插入图片描述

在这里插入图片描述

3.2 Global loss

[FGD] Focal and Global Knowledge Distillation for detectors (CVPR. 2022)_第13张图片

  • As shown in Fig. 4, we utilize GcBlock [2] to capture the global relation information in a single image and force the student detector to learn the relation from the teacher detector.

[FGD] Focal and Global Knowledge Distillation for detectors (CVPR. 2022)_第14张图片

student model 总得loss:

在这里插入图片描述

4. Experiments

本文使用了 General instance distillation for object detection.(ICCV2021)中的一个方法(inherit strategy),对于相同head 结构的stu和tea,使用tea的权重对stu model 进行初始化。

4.1 Main results

[FGD] Focal and Global Knowledge Distillation for detectors (CVPR. 2022)_第15张图片

[FGD] Focal and Global Knowledge Distillation for detectors (CVPR. 2022)_第16张图片

![在这里插入图片描述](https://img-blog.csdnimg.cn/02cd938c56db439ba7091968dcea5caa.png)

4.2 Abla

4.2.1 Sensitivity study of different losses

[FGD] Focal and Global Knowledge Distillation for detectors (CVPR. 2022)_第17张图片

4.2.1 Sensitivity study of focal distillation

[FGD] Focal and Global Knowledge Distillation for detectors (CVPR. 2022)_第18张图片

4.2.2 Sensitivity study of global distillation

[FGD] Focal and Global Knowledge Distillation for detectors (CVPR. 2022)_第19张图片

4.2.3 Sensitivity study of T

[FGD] Focal and Global Knowledge Distillation for detectors (CVPR. 2022)_第20张图片

4.2.4 Sensitivity study of hyper-parameters

[FGD] Focal and Global Knowledge Distillation for detectors (CVPR. 2022)_第21张图片

你可能感兴趣的:(笔记,深度学习,人工智能,计算机视觉)