目录
MaskRCNN概述:
D),损失部分解析
1,rpn 分类损失 交叉熵
2,rpn 回归损失 SmoothL1
3,mrcnn 分类损失 交叉熵
4,mrcnn 回归损失 SmoothL1
5,mask 损失 掩膜二进制交叉熵
Smooth-L1
Mask R-CNN是一个小巧、灵活的通用对象实例分割框架(object instance segmentation)。它不仅可对图像中的目标进行检测,还可以对每一个目标给出一个高质量的分割结果。它在Faster R-CNN[1]基础之上进行扩展,并行地在bounding box recognition分支上添加一个用于预测目标掩模(object mask)的新分支。该网络还很容易扩展到其他任务中,比如估计人的姿势,也就是关键点识别(person keypoint detection)。该框架在COCO的一些列挑战任务重都取得了最好的结果,包括实例分割(instance segmentation)、候选框目标检测(bounding-box object detection)和人关键点检测(person keypoint detection)。
参考文章:
Mask RCNN 学习笔记
MaskRCNN源码解读
令人拍案称奇的Mask RCNN
论文笔记:Mask R-CNN
Mask R-CNN个人理解
解析源码地址:
https://github.com/matterport/Mask_RCNN
Mask RCNN中总共有五个损失函数,分别是rpn网络的两个损失,mrcnn的两个损失,以及mask分支的损失函数。
前四个损失函数与fasterrcnn的损失函数一样,最后的mask损失函数的采用的是mask分支对于每个RoI有K*m^2维度的输出。K个(类别数)分辨率为m * m的二值mask。 Lmask为平均二值交叉熵损失(the average binary cross - entropy loss). 对于一个属于第k个类别的RoI, Lmask仅仅考虑第k个mask(其他的掩模输入不会贡献到损失函数中)。这样的定义会允许对每个类别都会生成掩模,并且不会存在类间竞争。
代码中损失部分的整体代码如下:
# *************************8,计算各部分的损失******************************************************************
# maskrcnn中总共有五个损失函数,分别是rpn网络的两个损失,mrcnn的两个损失,以及mask分支的损失函数。
# 前四个损失函数与fasterrcnn的损失函数一样,最后的mask损失函数的采用的是mask分支对于每个RoI有K*m^2维度的输出。
# K个(类别数)分辨率为m * m的二值mask。
# 因此作者利用了aper - pixelsigmoid,并且定义Lmask为平均二值交叉熵损失(the average binary cross - entropy loss).
# 对于一个属于第k个类别的RoI, Lmask仅仅考虑第k个mask(其他的掩模输入不会贡献到损失函数中)。
# 这样的定义会允许对每个类别都会生成掩模,并且不会存在类间竞争。
# Losses
# rpn 分类损失
rpn_class_loss = KL.Lambda(lambda x: rpn_class_loss_graph(*x), name="rpn_class_loss")(
[input_rpn_match, rpn_class_logits])
# rpn 回归损失
rpn_bbox_loss = KL.Lambda(lambda x: rpn_bbox_loss_graph(config, *x), name="rpn_bbox_loss")(
[input_rpn_bbox, input_rpn_match, rpn_bbox])
# mrcnn 分类损失
class_loss = KL.Lambda(lambda x: mrcnn_class_loss_graph(*x), name="mrcnn_class_loss")(
[target_class_ids, mrcnn_class_logits, active_class_ids])
# mrcnn 回归损失
bbox_loss = KL.Lambda(lambda x: mrcnn_bbox_loss_graph(*x), name="mrcnn_bbox_loss")(
[target_bbox, target_class_ids, mrcnn_bbox])
# mask 损失
mask_loss = KL.Lambda(lambda x: mrcnn_mask_loss_graph(*x), name="mrcnn_mask_loss")(
[target_mask, target_class_ids, mrcnn_mask])
rpn_match与GT有关,前景为1背景为0;
rpn_class_logits 是rpn_graph中生成的,是特征图Reshape to [batch, anchors, 2]但没有经过softmax激活的值。
# rpn 分类损失 交叉熵
def rpn_class_loss_graph(rpn_match, rpn_class_logits):
"""RPN anchor classifier loss.
rpn_match: [batch, anchors, 1]. Anchor match type. 1=positive,
-1=negative, 0=neutral anchor.
rpn_class_logits: [batch, anchors, 2]. RPN classifier logits for BG/FG.
"""
# Squeeze last dim to simplify
rpn_match = tf.squeeze(rpn_match, -1)
# Get anchor classes. Convert the -1/+1 match to 0/1 values. # 正样本转换为1,负样本和忽略的转换为0
anchor_class = K.cast(K.equal(rpn_match, 1), tf.int32)
# Positive and Negative anchors contribute to the loss, but neutral anchors (match value = 0) don't.
indices = tf.where(K.not_equal(rpn_match, 0)) # 取不等于0的,即只取正样本
# Pick rows that contribute to the loss and filter out the rest.
rpn_class_logits = tf.gather_nd(rpn_class_logits, indices) # 选择对损失由贡献的行,忽略其他行
anchor_class = tf.gather_nd(anchor_class, indices)
# 交叉熵损失Cross entropy loss
loss = K.sparse_categorical_crossentropy(target=anchor_class,
output=rpn_class_logits,
from_logits=True)
loss = K.switch(tf.size(loss) > 0, K.mean(loss), tf.constant(0.0)) # 如果损失大于0输出,小于0输出0
return loss
target_bbox 就是GT
rpn_match与GT有关,前景为1背景为0;
rpn_bbox 是rpn_graph中生成的,特征图Reshape to [batch, anchors, 4]的值。
SmoothL1损失之前解析其他算法时已经讲过了,这里也不多说了。
# rpn 回归损失 SmoothL1
def rpn_bbox_loss_graph(config, target_bbox, rpn_match, rpn_bbox):
"""Return the RPN bounding box loss graph.
config: the model config object.
target_bbox: [batch, max positive anchors, (dy, dx, log(dh), log(dw))].
Uses 0 padding to fill in unsed bbox deltas.
rpn_match: [batch, anchors, 1]. Anchor match type. 1=positive,
-1=negative, 0=neutral anchor.
rpn_bbox: [batch, anchors, (dy, dx, log(dh), log(dw))]
"""
# Positive anchors contribute to the loss, but negative and
# neutral anchors (match value of 0 or -1) don't.
rpn_match = K.squeeze(rpn_match, -1) # squeeze()将下标为axis的一维从张量中移除
indices = tf.where(K.equal(rpn_match, 1)) # 取正样本索引
# Pick bbox deltas that contribute to the loss
rpn_bbox = tf.gather_nd(rpn_bbox, indices) # 选择正样本偏移量
# Trim target bounding box deltas to the same length as rpn_bbox.
# 将目标边界框增量修剪为与rpn_bbox相同的长度。
batch_counts = K.sum(K.cast(K.equal(rpn_match, 1), tf.int32), axis=1)
target_bbox = batch_pack_graph(target_bbox, batch_counts, config.IMAGES_PER_GPU)
loss = smooth_l1_loss(target_bbox, rpn_bbox)
loss = K.switch(tf.size(loss) > 0, K.mean(loss), tf.constant(0.0))
return loss
target_class_ids, 目标类别ID GT;
pred_class_logits, 特征图由头网络连接全连接层得到,预测的类别ID;
active_class_ids 实际的类别ids 80类;
实际用target_class_ids和pred_class_logits计算交叉熵损失;
active_class_ids用于消除不在图像的预测类别中的类别的预测损失。
# mrcnn 分类损失 交叉熵
def mrcnn_class_loss_graph(target_class_ids, pred_class_logits, active_class_ids):
"""Loss for the classifier head of Mask RCNN.
target_class_ids: [batch, num_rois]. Integer class IDs. Uses zero
padding to fill in the array.
pred_class_logits: [batch, num_rois, num_classes]
active_class_ids: [batch, num_classes]. Has a value of 1 for
classes that are in the dataset of the image, and 0
for classes that are not in the dataset.
"""
# During model building, Keras calls this function with
# target_class_ids of type float32. Unclear why. Cast it
# to int to get around it.
target_class_ids = tf.cast(target_class_ids, 'int64')
# Find predictions of classes that are not in the dataset.
# 查找不在数据集中的类的预测
pred_class_ids = tf.argmax(pred_class_logits, axis=2)
# TODO: Update this line to work with batch > 1. Right now it assumes all
# images in a batch have the same active_class_ids
pred_active = tf.gather(active_class_ids[0], pred_class_ids)
# Loss
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(
labels=target_class_ids, logits=pred_class_logits)
# Erase losses of predictions of classes that are not in the active
# classes of the image.
# 消除不在图像的预测类别中的类别的预测损失。
loss = loss * pred_active
# Computer loss mean. Use only predictions that contribute
# to the loss to get a correct mean.
loss = tf.reduce_sum(loss) / tf.reduce_sum(pred_active)
return loss
target_bbox, 就是GT框
target_class_ids, GT框对应的类别ID
pred_bbox 由特征图经过头网络卷积得到的预测框
# mrcnn 回归损失 SmoothL1
def mrcnn_bbox_loss_graph(target_bbox, target_class_ids, pred_bbox):
"""Loss for Mask R-CNN bounding box refinement.
target_bbox: [batch, num_rois, (dy, dx, log(dh), log(dw))]
target_class_ids: [batch, num_rois]. Integer class IDs.
pred_bbox: [batch, num_rois, num_classes, (dy, dx, log(dh), log(dw))]
"""
# Reshape to merge batch and roi dimensions for simplicity.
target_class_ids = K.reshape(target_class_ids, (-1,))
target_bbox = K.reshape(target_bbox, (-1, 4))
pred_bbox = K.reshape(pred_bbox, (-1, K.int_shape(pred_bbox)[2], 4))
# Only positive ROIs contribute to the loss. And only
# the right class_id of each ROI. Get their indices.
positive_roi_ix = tf.where(target_class_ids > 0)[:, 0]
positive_roi_class_ids = tf.cast(
tf.gather(target_class_ids, positive_roi_ix), tf.int64)
indices = tf.stack([positive_roi_ix, positive_roi_class_ids], axis=1)
# Gather the deltas (predicted and true) that contribute to loss
target_bbox = tf.gather(target_bbox, positive_roi_ix)
pred_bbox = tf.gather_nd(pred_bbox, indices)
# Smooth-L1 Loss
loss = K.switch(tf.size(target_bbox) > 0,
smooth_l1_loss(y_true=target_bbox, y_pred=pred_bbox),
tf.constant(0.0))
loss = K.mean(loss)
return loss
Lmask是mask分支上的损失函数,输出大小为K*m*m,其编码分辨率为m*m的K个二进制mask,即K个类别每个对应一个二进制mask,对每个像素使用sigmoid 函数,Lmask是平均二进制交叉熵损失。RoI的groundtruth类别为k,Lmask只定义在第k个Mask上,其余的mask属于对它没有影响(也就是说在训练的时候,虽然每个点都会有K个二进制mask,但是只有一个k类mask对损失有贡献,这个k值是分类branch预测出来的)。
Mask-RCNN没有类间竞争,因为其他类别不贡献损失。mask分支对每个类别都有预测,依靠分类层选择输出mask(此时大小应该是m*m,之预测了一个类别出来,只需要输出该类别对应的mask即可),使用FCN的一般方法是对每个像素使用softmax以及多项交叉熵损失,会出现类间竞争。二值交叉熵会使得每一类的 mask 不相互竞争,而不是和其他类别的 mask 比较 。
target_mask, GT mask
target_class_ids, GT框对应的类别ID
mrcnn_mask 由图训练的到的mask
def mrcnn_mask_loss_graph(target_masks, target_class_ids, pred_masks):
"""Mask binary cross-entropy loss for the masks head.
target_masks: [batch, num_rois, height, width].
A float32 tensor of values 0 or 1. Uses zero padding to fill array.
target_class_ids: [batch, num_rois]. Integer class IDs. Zero padded.
pred_masks: [batch, proposals, height, width, num_classes] float32 tensor
with values from 0 to 1.
"""
# Reshape for simplicity. Merge first two dimensions into one.
target_class_ids = K.reshape(target_class_ids, (-1,))
mask_shape = tf.shape(target_masks)
target_masks = K.reshape(target_masks, (-1, mask_shape[2], mask_shape[3]))
pred_shape = tf.shape(pred_masks)
pred_masks = K.reshape(pred_masks,
(-1, pred_shape[2], pred_shape[3], pred_shape[4]))
# Permute predicted masks to [N, num_classes, height, width]
pred_masks = tf.transpose(pred_masks, [0, 3, 1, 2])
# Only positive ROIs contribute to the loss. And only
# the class specific mask of each ROI.
positive_ix = tf.where(target_class_ids > 0)[:, 0]
positive_class_ids = tf.cast(
tf.gather(target_class_ids, positive_ix), tf.int64)
indices = tf.stack([positive_ix, positive_class_ids], axis=1)
# Gather the masks (predicted and true) that contribute to loss
y_true = tf.gather(target_masks, positive_ix)
y_pred = tf.gather_nd(pred_masks, indices)
# Compute binary cross entropy. If no positive ROIs, then return 0.
# shape: [batch, roi, num_classes]
loss = K.switch(tf.size(y_true) > 0,
K.binary_crossentropy(target=y_true, output=y_pred),
tf.constant(0.0))
loss = K.mean(loss)
return loss
代码中的x=K.abs(y_true-y_pred)
"""
Implements Smooth-L1 loss.
y_true and y_pred are typically: [N, 4], but could be any shape.
"""
def smooth_l1_loss(y_true, y_pred):
diff = K.abs(y_true - y_pred)
less_than_one = K.cast(K.less(diff, 1.0), "float32")
loss = (less_than_one * 0.5 * diff**2) + (1 - less_than_one) * (diff - 0.5)
return loss