[OLN] Learning Open-World Object Proposals without Learning to Classify

1. Motivation

  • Our main insight is that the classifiers in existing object proposers or class agnostic detectors impedes such generalization, because the model tends to overfit to labeled objects and treat the unlabeled objects in the training set as background.
  • This matches the ability of human to detect novel objects in new environments without naming their categories

2. Contribution

  • To our knowledge, we are the first to show the value of pure localization-based objectness learning for novel object proposals, and propose a simple-yet-effective classifier-free Object Localization Network (OLN).
  • Our approach outperforms state-of-the-art methods on cross-category setting on COCO and improves cross- dataset settings on RoboNet and Object365, long-tail detection (LVIS) and egocentric videos (EpicKitchens) over the standard approach.
  • We carefully annotated the RoboNet dataset for the presence of all objects in an exhaustive fashion. We perform open-world class-agnostic object detection, and evaluate the Average Precision, which also improves existing AR-based evaluation of proposals on partially-annotated data.
  • Extensive ablation and analysis on OLN modeling choices reveal the benefits of each localization cue and the overfitting of existing classifier-based methods.

3. Method

3.1 Baselines

Region pro- posal network (RPN) + Faster R-CNN

要将RPN这样的基于前景背景类的划分,转换为region和gt 之间的重合程度。因为RPN的前景和背景的划分,会造成在检测novel class的时候,会把novel class检测为背景。

3.2 Pure localization-based objectness

  • how well does this region overlap with any ground-truth object? 替换“how much does this region look like a foreground object?
  • Our intuition is that every object can be characterized by its location and shape, regardless of its category.

IOU + CenterNess scores

  • We adopt centerness [56] and IoU score [29] for location and shape quality measures respectively, while not restricting other choices such as Dice coefficient [40] and generalized- IoU [46].

3.3 Object Localization Network (OLN)

  • The goal of OLN is to learn localization for objects and enable better generalization to new and unseen categories.
  • the classifiers in both FPN and ROI stages are replaced with localization quality predictions

3.3.1 OLN-RPN

对于RPN层原有的objectness替换为了localization quality,使用centerness作用target,并且使用L1 loss对于原始的回归层和localization quality进行训练。

  • We choose centerness [56] as the localization quality target and train both heads with L1 losses.

对于训练localization quality 分支来说,使用IOU大于0.3的anchor,并且在回归分支中,不是使用原有的偏移量deltas,而是使用FCOS中的ltrb,并且原来每个location预测3个anchor,现在只用预先设置好一个。

  • For the box regression, we replace the standard box-delta targets (xyhw) with distances from the location to four sides of the ground-truth box (lrtb) as in [56].
  • We choose to use one anchor per feature location as opposed to 3 in RPN,

3.3.2 OLN-Box

通过RPN localization quality top soring作为ROI层的proposals 输入,并且使用RoIAlign,得到每个proposal的特征。

OLN-Box阶段使用的target的是OLN-proposals和gt之间的IOU。

同样把最后的分类层改成了localization quality。

同样,使用OLN-RPN得到的propsal的ltrb和gt-box的IOU作为localization quality的target(注意由于是Faster R-CNN,所以已经是被crop以后的特征了,OLN-RPN是使用anchor和gt-box的centerness作为target),使用L1 loss训练2分支。

这里要注意的是,看原文,应该这里在计算回归的时候,还是使用regression的deltas。(应该是对于ltrb生成的proposal的偏移量)

3.3.3 OLN-Mask

参考,mask-srcoing R-CNN,回归predicted和gt mask的iou。

3.3.4 Inference

对于OLN-BOX,最后scores的计算是centerss和iou的几何平均, s = s ⋅ d s = \sqrt {s \cdot d} s=sd

4. Experiment

  • cross-category generalization on COCO dataset
  • open-world class-agnostic detection where
  • cross-dataset generalization
  • impact on long-tail object detection

4.1 Cross-category generalization

  • We split the COCO dataset into 20 seen (VOC) classes and 60 unseen (non-VOC) classes.
  • We train a model with box annotations of only seen classes, and evaluate the recall on unseen non-VOC classes only.

4.2 Comparison with learning-free methods

4.3 Ablation: modeling choices

你可能感兴趣的:(笔记,深度学习,机器学习)