Mask R-CNN阅读笔记

Mask R-CNN阅读笔记

提出背景
主要工作

The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition.
在Faster R-CNN基础上,加了一条分支来预测Mask,从而实现了高效的物体检测与精细的分割。
In instance segmentation, boundingbox object detection, and person keypoint detection, Mask R-CNN outperforms all existing single-model entries.
在物体分割、检测、人体关键点检测都取得了超过当前存在的单模型的效果。

Mask R-CNN
  • 损失
    L = Lcls + Lbox + Lmask
    在mask分支,对于每个ROI预测km^2维的输出,其中k为类别数,m^2为mask大小。
    在计算mask损失时,只计算k-th mask(k=ground-truth)
    在测试阶段,通过cls分支选择输出的mask通道。
    This is different from common practice when applying FCNs to semantic segmentation, which typically uses a per-pixel softmax and a multinomial cross-entropy loss. In that case, masks across classes compete; in our case, with a per-pixel sigmoid and a binary loss, they do not.
    通常的分割网络对每个像素使用softmax损失,从而造成mask类别间的竞争,该方法则不会。

  • RoIAlign
    This pixel-to-pixel behavior requires our RoI features, which themselves are small feature maps, to be well aligned to faithfully preserve the explicit per-pixel spatial correspondence. This motivated us to develop the following RoIAlign layer that plays a key role in mask prediction.
    mask的像素到像素的特性要求ROI特征能很好地保留每个像素地空间信息。
    RoIPool first quantizes a floating-number RoI to the discrete granularity of the feature map, this quantized RoI is then subdivided into spatial bins which are themselves quantized, and finally feature values covered by each bin are aggregated (usually by max pooling).
    RoIPool首先将大小为浮点数地ROI量化成离散的特征图(举个例子,roi长为30.2,宽为25.6,则首先量化成30,26),随后分成7*7的格子,则每个格子的大小也是浮点数同样会被量化成整数。
    These quantizations introduce misalignments between the RoI and the extracted features.
    这些量化造成了ROI和提取特征间的空间不匹配。
    we avoid any quantization of the RoI boundaries or bins (i.e., we use x=16 instead of [x=16]). We use bilinear interpolation to compute the exact values of the input features at four regularly sampled locations in each RoI bin,and aggregate the result (using max or average)。
    对于每个ROI或是格子的边界都不做量化,而是使用双线性插值来计算每个格子四个规定采样点的真实值,随后对这四个点使用平均或最大池化。

你可能感兴趣的:(Convolutional,Neural,Networks)