【论文】【实例分割】YOLACT

YOLACT: Real-time Instance Segmentation

https://arxiv.org/abs/1904.02689
https://zhuanlan.zhihu.com/p/62652145
https://blog.csdn.net/sinat_37532065/article/details/89415374
【论文】【实例分割】YOLACT_第1张图片

We present a simple, fully-convolutional model for real-
time instance segmentation that achieves 29.8 mAP on MS COCO at 33 fps
evaluated on a single Titan Xp, which is significantly faster than any
previous competitive approach. Moreover, we obtain this result after
training on only one GPU. We accomplish this by breaking instance
segmenta- tion into two parallel subtasks: (1) generating a set of
pro- totype masks and (2) predicting per-instance mask coeffi- cients.
Then we produce instance masks by linearly combin- ing the prototypes
with the mask coefficients. We find that because this process doesn’t
depend on repooling, this ap- proach produces very high-quality masks
and exhibits tem- poral stability for free. Furthermore, we analyze
the emer- gent behavior of our prototypes and show they learn to lo-
calize instances on their own in a translation variant man- ner,
despite being fully-convolutional. Finally, we also pro- pose Fast
NMS, a drop-in 12 ms faster replacement for stan- dard NMS that only
has a marginal performance penalty.

简介

提出了一个实时实例分割的模型,采用的是单阶段网络。将instance segmentation分为了两个任务:

  1. 生成prototype mask over the entire image。目的是生成用于实例分割的mask模板。相当于一组基。
  2. 预测每个实例的线性组合系数,这些系数结合prototype mask生成了最终的instance mask。

最终,prototype mask能够学到物体本身、物体边界、position sensitive的信息,用于预测。

内容

模型

分为两个部分:

Prototype mask:生成原型mask,接在FPN的最高分辨率层,通过上采样一次得到。

设计准则:

  • FPN的P3层即具有最深的深度(绕了个U字形),又有最高的分辨率。
  • 上采样能够利于检测小物体
  • 输出没有限制最佳,使用relu。

We note two important design choices: taking pro-
tonet from deeper backbone features produces more ro-
bust masks, and higher resolution prototypes result in both
higher quality masks and better performance on smaller ob-
jects. Thus, we use FPN [24] because its largest feature
layers (P 3 in our case; see Figure 2) are the deepest. Then,
we upsample it to one fourth the dimensions of the input
image to increase performance on small objects.

【论文】【实例分割】YOLACT_第2张图片
head:生成mask coefficients,以及分类、bbox。对于每一个anchor,生成一个k维向量,是用于prototype线性组合的系数。
【论文】【实例分割】YOLACT_第3张图片
最终

  • 通过protonet获得prototype
  • 通过head获得bbox和bbox对应的label、coefficient,使用NMS进行筛选,得到候选窗。

得到了HxWxk的P(prototype)和Nxk的C(instance mask coefficients), N为NMS之后的个数。
取出一个C的元素,记为c,将P中bbox的部分抠出来,记为p,将p的各个channel按照c进行线性组合,再sigmoid一下,得到了最终的mask。整个过程如下图:
【论文】【实例分割】YOLACT_第4张图片
其他的改进: Fast NMS

实验

效果一般,但是很快,而且相当鲁棒。

总结

提出了一种自底向上的单阶段实例分割算法。

你可能感兴趣的:(论文)