【论文阅读】【二维目标检测】RepPoints: Point Set Representation for Object Detection

文章目录

  • RepPoints: Point Set Representation for Object Detection
    • RepPoints
    • RPDet
    • 实验
  • 思考

关键词:ICCV2019,anchor free,two-stage

该论文是目前anchor free的文章单模型效果最好的,anchor free的其他文章在我的另外一篇博客里有介绍。对比其他文章,该文章显式的不同有如下:

  • 其他文章基本都用的是单阶段的模型
  • 除了GA-RPN,其他文章基本都没有用Deformable Convolution
  • 其他使用关键点的表示box的文章是把object detection问题转成了key point estimation的问题,本文仍然采用object detection的思路

RepPoints: Point Set Representation for Object Detection

作者提出,bounding box在object detection虽然使用方便评价方便,但其实对于物体的定位来说还是相对于粗糙的,bounding box中仍然包含大量背景。RepPoints就是为了更精细化地表示object localization提出的。

值得提的一点是论文中对two stage中多阶段中的box的命名非常清晰易懂:“from anchors and proposals to final predictions”,之前读Cascade R-CNN,*“hypothesis”*读的我发晕。

没读懂的一点是*“In contrast, RepPoints are learned in a top-down fashion from the input image / object features, allowing for end-to-end training and producing fine-grained localization without additional supervision.”*为什么RepPoints就是top-down了?

RepPoints

RepPointss是使用9个representative points来表示物体的位置,为什么是9个呢,我猜想是因为使用3*3的deformable convolution能生成9个offset,得到9个点。

Center point based initial object representation这种表示方法是RepPoints的特例,即使用1个点来表示object的位置。这样做相比于anchor的好处是,用1个点表示object的问题,是一个2d空间内的表示问题,很容易就能覆盖整个图像中object的位置(每个像素都判断是否是center),但使用anchor则是一个4d空间内的表示问题,不容易完全覆盖。
"An important benefit of the center point representation lies in its much tighter hypothesis space compared to the anchor based counterparts. While anchor based approaches usually rely on a large number of multi-ratio and multi-scale anchors to ensure dense coverage of the large 4-d bounding box hypothesis space, a center point based approach can more easily cover its 2-d space. In fact, all objects will have center points located within the image. "

RepPoints的refinement也比anchor容易,因为points的refinement就是位置的偏移量,其scale实一样的。

RepPoints可以转换成Bounding box,参与使用IoU为标准的对比。

RPDet

【论文阅读】【二维目标检测】RepPoints: Point Set Representation for Object Detection_第1张图片
【论文阅读】【二维目标检测】RepPoints: Point Set Representation for Object Detection_第2张图片

上面两张图很好的展现了RPDet的结构,Classification的部分是在fisrt stage就确定的,second stage就是在refine RepPoints。具体的target assignment如下:
【论文阅读】【二维目标检测】RepPoints: Point Set Representation for Object Detection_第3张图片

实验

  • RepPoints vs. bounding box:分别在baseline上使用RepPoints和bounding box,证明了RepPoints的有效性

  • Supervision source for RepPoints learning:有趣的是在使用recognition loss监督RepPoints的学习也能有效提升性能,而在bounding box的对比中则没有提升。“The use of the object recognition loss can drive the RepPoints to locate themselves at semantically meaningful positions on an object, which leads to fine-grained localization and improves object feature extraction for the following recognition stage. Note.”

  • Anchor-free vs. anchor-based:“For both detectors using bounding boxes and RepPoints, the center point based method surpass the anchor based method by +1.1 mAP and +1.4 mAP, respectively, likely because of its better coverage of ground-truth objects.”

-RepPoints act complementary to deformable RoI pooling:这一块其实效果也就涨了0.1,并说明不了太大的问题。在Appendix中对deformable RoI pooling不能有效provide a geometric representation of objects很有意思,但对于RepPoints能provide a geometric representation of objects解释的同样很粗糙。

思考

1、与AlignDet:Revisiting Feature Alignment for One-stage Object Detection的对比
在知乎上,对AlignDet的评论中,RepPoints的作者评论道:
“hello,我们是RepPoints的作者,很感谢分享工作到arXiv上面。我们仔细阅读了paper,但是我们感觉AlignDet就是我们RepPoints ablation的baseline方法(table 1和2的第一行Bounding Box,除了AlignDet用的是7x7 RoIAlign/RoIConv,而我们用的是3x3)。也许是我们没把方法描述清楚,感觉AlignDet作者可能没有get到我们很细节的做法,有点抱歉。。”

细看文章之后,发现原文中存在以下叙述:
“The two sets of RepPoints are replaced by bounding box representation, where the geometric re- finement is achieved by the standard bounding box regres- sion method, and the feature extraction is replaced by the
RoIAlign [13] method using 3 × 3 grid points”
.
和正文下的注释
2 ^2 2It can be also implemented by a deformable convolution operator with an unlearnable input offset field induced by the 3×3 grid points.”

这真是太秀了,细细理解一下,还确实一样。

2、Tabel 3 的理解:
第二行可能的解释就是类似于CenterNet的操作。
但init为box,proposal为RepPoints,这个操作也没有详细解释,怎么从box变为RepPoints的操作没有说明,这一块感觉理解不了。

你可能感兴趣的:(论文阅读,#,CNN在CV的使用)