【R-FCN-3000】《R-FCN-3000 at 30fps: Decoupling Detection and Classification》

【R-FCN-3000】《R-FCN-3000 at 30fps: Decoupling Detection and Classification》_第1张图片
CVPR-2018


文章目录

    • 1 Background and Motivation
      • 1.1 Background
      • 1.2 Motivation
      • 1.3 Notion
    • 2 Innovation
    • 3 Advantages
    • 4 Method
      • 4.1 Weakly Supervised vs. Supervised?
      • 4.2 Superclass Discovery
      • 4.3 Architecture
      • 4.4 Label Assignment
      • 4.5 Loss Function
    • 5 Experiments
      • 5.1 Dataset
      • 5.2 Comparison with Weakly Supervised Detectors
      • 5.3 Speed and Performance
    • 6 Discussion
      • 6.1 Impact of Number of Classes and Clusters
      • 6.2 Are PositionSensitive Filters Per Class Necessary?
    • 参考


1 Background and Motivation

Objectness is a generic concept and a universal objectness detector can be learned.

本片论文在 R-FCN的基础上进行改进,像 YOLO9000一样,拓展成能识别3000的结构,在不牺牲速度的同时,一定程度上保证了精度。

【R-FCN】《R-FCN: Object Detection via Region-based Fully Convolutional Networks》(NIPS-2016)

1.1 Background

现在的目标检测系统

  • Good performance in benchmark datasets
    (R-CNN、Fast R-CNN、Faster R-CNN、 Deformable Convolutional Networks、Mask RCNN)

    • PASCAL VOC:33%-88%
    • COCO:37%-73%(at 50 overlap)
  • Bad for real-life object detection(eg: YOLO 9000,at the cost of accuracy)

    • speed
    • thousands of classes

1.2 Motivation

  • Many object classes are visually similar and share parts.

  • Decouple objectness detection and classification of the detected object so that the computational quirements for localization remain constant as the number of classes increases.

【R-FCN-3000】《R-FCN-3000 at 30fps: Decoupling Detection and Classification》_第2张图片

1.3 Notion

  • fine-grained1
    【R-FCN-3000】《R-FCN-3000 at 30fps: Decoupling Detection and Classification》_第3张图片
  • decoupling 2

1)耦合

  • 耦合是指两个或两个以上的体系或两种运动形式间通过相互作用而彼此影响以至联合起来的现象。

  • 在软件工程中,对象之间的耦合度就是对象之间的依赖性。对象之间的耦合越高,维护成本越高,因此对象的设计应使类和构件之间的耦合最小。

  • 分类:有软硬件之间的耦合,还有软件各模块之间的耦合。耦合性是程序结构中各个模块之间相互关联的度量。它取决于各个模块之间的接口的复杂程度、调用模块的方式以及哪些信息通过接口。

2)解耦

  • 解耦,字面意思就是解除耦合关系。

  • 在软件工程中,降低耦合度即可以理解为解耦,模块间有依赖关系必然存在耦合,理论上的绝对零耦合是做不到的,但可以通过一些现有的方法将耦合度降至最低。

  • 设计的核心思想:尽可能减少代码耦合,如果发现代码耦合,就要采取解耦技术。让数据模型,业务逻辑和视图显示三层之间彼此降低耦合,把关联依赖降到最低,而不至于牵一发而动全身。原则就是A功能的代码不要写在B的功能代码中,如果两者之间需要交互,可以通过接口,通过消息,甚至可以引入框架,但总之就是不要直接交叉写。

  • 观察者模式:观察者模式存在的意义就是「解耦」,它使观察者和被观察者的逻辑不再搅在一起,而是彼此独立、互不依赖。比如网易新闻的夜间模式,当用户切换成夜间模式之后,被观察者会通知所有的观察者「设置改变了,大家快蒙上遮罩吧」。QQ消息推送来了之后,既要在通知栏上弹个推送,又要在桌面上标个小红点,也是观察者与被观察者的巧妙配合。

2 Innovation

  • modification of the R-FCN architecture

3 Advantages

  • It outperforms YOLO-9000 by 18% while processing 30 images per second.
  • zero-shot(unseen classes) 效果还行

4 Method

4.1 Weakly Supervised vs. Supervised?

gap is large
用分类的数据,做 object detection
比如分狗,只识别到身体就足以区别其他类,腿和尾巴特征往往不重要,但是这很利于 object detection(bounding box)

做3000类的 object detection

  • supervised:no data(like COCO,VOC)
  • weakly supervised:poor performance

作者折衷,用了 ImageNet 数据集

A potential downside of using ImageNet for training object detectors is the loss of variation in scale and context around objects available in detection datasets, but we do have access to the bounding boxes of the objects.

聊胜于无嘛

4.2 Superclass Discovery

  • ResNet-101

  • the average of 2048-dimensional + K-means

    { x j : j ∈ { 1 , 2 , . . . , C } } \left \{ x_j:j\in \left \{ 1,2,...,C \right \} \right \} {xj:j{1,2,...,C}}

    C C C 是 super class

    x x x 是 ResNet-101 最后一层的average , x j i x_j^i xji, i i i 是 0到2048

4.3 Architecture

【R-FCN-3000】《R-FCN-3000 at 30fps: Decoupling Detection and Classification》_第4张图片

上面分支

  • Super classes:position sensitive filters
    分类(超类k+1背景)+回归(offset)

下面分支

  • Fine grained:without position sensitive filters
    分类(c类)

最后合二为一
超类分数*细类分数(因为聚类预先知道哪些细类归为一个超类)

4.4 Label Assignment

在这里插入图片描述

k 超类,0……K
c 细类,0……C

一个超类由很多细类组成

  • Detection(K+1 类(K-means 结果 + background))
    • positive RoI: k i k_i ki vs Ground True (overlap 大于 0.5)
    • background:Otherwise
  • Classification(C 类)
    • train only positive

4.5 Loss Function

可参考 Fast RCNN 的 loss 【Fast RCNN】《Fast-RCNN》

0.05*Smooth L1(定位) + L(分类)(softmax的输出 vs GT)

0.05 是因为分类需要 positive RoI,而定位是全部的 Proposal, 数量差距大,平衡一下loss而设定的

其实, loss 的结构并没有改变

5 Experiments

5.1 Dataset

【R-FCN-3000】《R-FCN-3000 at 30fps: Decoupling Detection and Classification》_第5张图片

  • 训练了 7 epochs
  • Training is performed on 2 Nvidia P6000 GPUs
  • (375x500)
  • Three anchor scales of (64,128,256),3 aspect ratios of (1:2), (1:1) and (2:1) for the anchor boxes,

5.2 Comparison with Weakly Supervised Detectors

【R-FCN-3000】《R-FCN-3000 at 30fps: Decoupling Detection and Classification》_第6张图片

5.3 Speed and Performance

【R-FCN-3000】《R-FCN-3000 at 30fps: Decoupling Detection and Classification》_第7张图片

随着超类(clusters)的增多,performance 提升,时间变多
All the speed results are on a P6000 GPU.

NMS is performed for a group of visually similar classes together, instead of each class separately.
(这个NMS怎么用的我其实不太了解咯)

6 Discussion

6.1 Impact of Number of Classes and Clusters

【R-FCN-3000】《R-FCN-3000 at 30fps: Decoupling Detection and Classification》_第8张图片

(a)、(b)中,从 cluster 5 到 class-specific ,performance下降不是那么大

In light of these observations, we can conclude that more crucial to R-FCN
is learning an objectness measure instead of class-specific objectness.

【R-FCN-3000】《R-FCN-3000 at 30fps: Decoupling Detection and Classification》_第9张图片

6.2 Are PositionSensitive Filters Per Class Necessary?

PASCAL VOC
【R-FCN-3000】《R-FCN-3000 at 30fps: Decoupling Detection and Classification》_第10张图片
base line:deformable R-FCN detector (ResNet 50)
decouple 之后performance 并没有降低很多

【R-FCN-3000】《R-FCN-3000 at 30fps: Decoupling Detection and Classification》_第11张图片

设计细节
【R-FCN-3000】《R-FCN-3000 at 30fps: Decoupling Detection and Classification》_第12张图片

34

参考


  1. 「见微知著」——细粒度图像分析进展综述 ↩︎

  2. 什么是耦合、解耦 ↩︎

  3. R-FCN-3000算法笔记 ↩︎

  4. R-FCN-3000 at 30fps: Decoupling Detection and Classification ↩︎

你可能感兴趣的:(CNN)