[loss盘点] paddle sigmoid_focal_loss 做了什么操作

前情提要

文档写的比我清楚,建议看文档,我只是心里不踏实,不自己整一次,这个API用着难受


原文在:
F o c a l   L o s s   f o r   D e n s e   O b j e c t   D e t e c t i o n Focal\ Loss\ for\ Dense\ Object\ Detection Focal Loss for Dense Object Detection

作者也在该文章里提出了 R e t i n a N e t RetinaNet RetinaNet, 来证明 F o c a l   l o s s Focal\ loss Focal loss 的有效性

函数签名:

paddle.nn.functional.sigmoid_focal_loss(logit, 
                                        label, 
                                        normalizer=None, 
                                        alpha=0.25, 
                                        gamma=2.0, 
                                        reduction='sum', 
                                        name=None)
  • a l p h a alpha alpha (int|float, 可选) - 用于平衡正样本和负样本的超参数,取值范围 [ 0 , 1 ] [0, 1] [0,1]。默认值为 0.25 0.25 0.25
  • g a m m a gamma gamma (int|float, 可选) - 用于平衡易分样本和难分样本的超参数,默认值设置为 2.0 2.0 2.0

除去参数 normalizer , F o c a l   l o s s Focal\ loss Focal loss的功能基本可以实现,公式:

O u t = − L a b e l s ∗ a l p h a ∗ ( 1 − σ ( L o g i t ) ) g a m m a log ⁡ ( σ ( L o g i t ) ) − ( 1 − L a b e l s ) ∗ ( 1 − a l p h a ) ∗ σ ( L o g i t ) g a m m a log ⁡ ( 1 − σ ( L o g i t ) ) \begin{aligned} Out = &-Labels * alpha * {(1 - \sigma(Logit))}^{gamma}\log(\sigma(Logit)) \\ &- (1 - Labels) * (1 - alpha) * {\sigma(Logit)}^{gamma}\log(1 - \sigma(Logit)) \end{aligned} Out=Labelsalpha(1σ(Logit))gammalog(σ(Logit))(1Labels)(1alpha)σ(Logit)gammalog(1σ(Logit))

假如某次实验的输出 l o g i t s logits logits 是这样的:

[[0.97, -0.91, 0.03],
 [-0.55, -0.43, 0.71]]

每一行是一个3分类,每一列是对应类的logit,若为多分类问题,则只需将该 l o g i t s logits logits 通过 sigmoid 函数,若为单分类问题,则在通过 softmax 函数并设定 axis=1 (一般来说,axis默认为-1,即在最后一维做softmax),则将 logits 换成每一类的概率,且相加为1,而多分类问题只需要保证每一类概率在0到1之间即可,所以只通过 sigmoid 即可

函数名字中有 sigmoid 所以每一个 logits 要通过 sigmoid 来做之后的计算,按照前文的公式,我们这样手动计算一次:

import paddle.nn.functional as F

alpha = 0.25
gamma = 2
logits = paddle.to_tensor([[0.97, -0.91, 0.03], [-0.55, -0.43, 0.71]])

out = -label     * alpha       * (1 - F.sigmoid(logits)) ** gamma * paddle.log(F.sigmoid(logits)) \
      -(1-label) * (1 - alpha) * F.sigmoid(logits) ** gamma       * paddle.log(1 - F.sigmoid(logits))

print(out)
Tensor(shape=[2, 3], dtype=float32, place=Place(cpu), stop_gradient=True,
       [[0.00607155, 0.02089742, 0.13681224],
        [0.04572807, 0.08544625, 0.37411609]])

带入 sigmoid_focal_loss 函数试一下,结果差不多

out = paddle.nn.functional.sigmoid_focal_loss(logit, label, reduction='none')
print(out)
Tensor(shape=[2, 3], dtype=float32, place=Place(cpu), stop_gradient=True,
       [[0.00607155, 0.02089742, 0.13681222],
        [0.04572806, 0.08544625, 0.37411612]])

PaddleDetection 中的 Focal Loss

class FocalLoss(nn.Layer):
    """A wrapper around paddle.nn.functional.sigmoid_focal_loss.
    Args:
        use_sigmoid (bool): currently only support use_sigmoid=True
        alpha (float): parameter alpha in Focal Loss
        gamma (float): parameter gamma in Focal Loss
        loss_weight (float): final loss will be multiplied by this
    """
    def __init__(self,
                 use_sigmoid=True,
                 alpha=0.25,
                 gamma=2.0,
                 loss_weight=1.0):
        super(FocalLoss, self).__init__()
        assert use_sigmoid == True, \
            'Focal Loss only supports sigmoid at the moment'
        self.use_sigmoid = use_sigmoid
        self.alpha = alpha
        self.gamma = gamma
        self.loss_weight = loss_weight

    def forward(self, pred, target, reduction='none'):
        """forward function.
        Args:
            pred (Tensor): logits of class prediction, of shape (N, num_classes)
            target (Tensor): target class label, of shape (N, )
            reduction (str): the way to reduce loss, one of (none, sum, mean)
        """
        num_classes = pred.shape[1]
        target = F.one_hot(target, num_classes+1).cast(pred.dtype)
        target = target[:, :-1].detach() # 此处再把背景类删去
        loss = F.sigmoid_focal_loss(
            pred, target, alpha=self.alpha, gamma=self.gamma,
            reduction=reduction)
        return loss * self.loss_weight

粗看 PaddleDetection 只是简单的封装了一下 F.sigmoid_focal_loss ,实际上还是有个 trick 的,一下部分节选自 RetinaNet 的代码:

gt_class = gt_class.reshape([-1])
bg_class = paddle.to_tensor(
    [self.num_classes], dtype=gt_class.dtype)
    
# a trick to assign num_classes to negative targets # 负例就是第 num_classes 类(从0开始)
gt_class = paddle.concat([gt_class, bg_class], axis=-1)

代码中把背景类写成第80(self.num_classes)类(从0开始的,实际上是第81类),而送入 PaddleDetection 的 Focal loss 中,one_hot 输入中加了一类留给背景类

target = F.one_hot(target, num_classes+1).cast(pred.dtype)

之后又把背景类删去:

target = target[:, :-1].detach() # 此处再把背景类删去

这样做是真的妙,不必再整个 81 类的分类器,什么是背景类,不是前面的80类的类,就是背景类,于是背景类的one-hot label 就是

[0, 0, 0, 0, 0, 0, 0, 0, ......, 0] # 共80个0

妙啊!

sigmoid_focal_loss 的 normalizer 参数

normalizer 参数给 out 做标准化,也不一定为归一化,其实只时充当一个除数,支持广播机制

normalizer = paddle.to_tensor([12]).cast("float32")
out = paddle.nn.functional.sigmoid_focal_loss(logit, label, reduction='none', normalizer=normalizer)
print(out)
Tensor(shape=[2, 3], dtype=float32, place=Place(cpu), stop_gradient=True,
       [[0.00050596, 0.00174145, 0.01140102],
        [0.00381067, 0.00712052, 0.03117634]])
normalizer = paddle.to_tensor([12, 12, 12]).cast("float32")
out = paddle.nn.functional.sigmoid_focal_loss(logit, label, reduction='none', normalizer=normalizer)
print(out)
Tensor(shape=[2, 3], dtype=float32, place=Place(cpu), stop_gradient=True,
       [[0.00050596, 0.00174145, 0.01140102],
        [0.00381067, 0.00712052, 0.03117634]])

其只能是一维 tensor 也就是说长度必须和类数量一致

你可能感兴趣的:(每日一氵,PaddleDetection,losses,paddle)