[深度学习论文笔记][Instance Segmentation] Instance-aware Semantic Segmentation via Multi-task Network Cascad

Dai, Jifeng, Kaiming He, and Jian Sun. “Instance-aware semantic segmentation via multitask network cascades.” arXiv preprint arXiv:1512.04412 (2015). (Citations: 40).


1 Motivation

All previous works require externel segmentation proposals, which are slow at test time.


2 Architecture

See Fig. We divide the task into three sub-tasks.
1. RPN. The instances can be represented by bounding boxes that are class-agnostic.
2. Estimating masks. A pixel-level mask is predicted by logistic regression for each bounding box instance. They are still class-agnostic.
3. Categorizing objects. The category-wise label is predicted for each mask-level instance.
We expect that each sub-task is simpler than the original instance segmentation task, and is more easily addressed by CNNs.

[深度学习论文笔记][Instance Segmentation] Instance-aware Semantic Segmentation via Multi-task Network Cascad_第1张图片


3 Cascades with More Stages
In Fast R-CNN, the classification head is trained jointly with class-wise bounding box regression head. Inspired by this practice, on stage 3, we add a regression head, , which is
a sibling layer with the classifier head. For the testing step, we first run the entire 3-stage network and obtain the regressed boxes on stage 3. These boxes are then considered as new proposals 1. Stages 2 and 3 are performed for the second time on these proposals. This is in fact 5-stage inference.

4 Training Details
Each stage involves a loss term, but a later stage’s loss relies on the output of an earlier stage. We train the entire network cascade end-to-end with a unified loss function.

5 References
[1]. https://www.youtube.com/watch?v=bUjyXASy_Jo.

你可能感兴趣的:(CNN,Papers)