文档写的比我清楚,建议看文档,我只是心里不踏实,不自己整一次,这个API用着难受
原文在:
F o c a l L o s s f o r D e n s e O b j e c t D e t e c t i o n Focal\ Loss\ for\ Dense\ Object\ Detection Focal Loss for Dense Object Detection
作者也在该文章里提出了 R e t i n a N e t RetinaNet RetinaNet, 来证明 F o c a l l o s s Focal\ loss Focal loss 的有效性
函数签名:
paddle.nn.functional.sigmoid_focal_loss(logit,
label,
normalizer=None,
alpha=0.25,
gamma=2.0,
reduction='sum',
name=None)
除去参数 normalizer
, F o c a l l o s s Focal\ loss Focal loss的功能基本可以实现,公式:
O u t = − L a b e l s ∗ a l p h a ∗ ( 1 − σ ( L o g i t ) ) g a m m a log ( σ ( L o g i t ) ) − ( 1 − L a b e l s ) ∗ ( 1 − a l p h a ) ∗ σ ( L o g i t ) g a m m a log ( 1 − σ ( L o g i t ) ) \begin{aligned} Out = &-Labels * alpha * {(1 - \sigma(Logit))}^{gamma}\log(\sigma(Logit)) \\ &- (1 - Labels) * (1 - alpha) * {\sigma(Logit)}^{gamma}\log(1 - \sigma(Logit)) \end{aligned} Out=−Labels∗alpha∗(1−σ(Logit))gammalog(σ(Logit))−(1−Labels)∗(1−alpha)∗σ(Logit)gammalog(1−σ(Logit))
假如某次实验的输出 l o g i t s logits logits 是这样的:
[[0.97, -0.91, 0.03],
[-0.55, -0.43, 0.71]]
每一行是一个3分类,每一列是对应类的logit,若为多分类问题,则只需将该 l o g i t s logits logits 通过 sigmoid
函数,若为单分类问题,则在通过 softmax
函数并设定 axis=1 (一般来说,axis默认为-1,即在最后一维做softmax
),则将 logits 换成每一类的概率,且相加为1,而多分类问题只需要保证每一类概率在0到1之间即可,所以只通过 sigmoid
即可
函数名字中有 sigmoid 所以每一个 logits 要通过 sigmoid 来做之后的计算,按照前文的公式,我们这样手动计算一次:
import paddle.nn.functional as F
alpha = 0.25
gamma = 2
logits = paddle.to_tensor([[0.97, -0.91, 0.03], [-0.55, -0.43, 0.71]])
out = -label * alpha * (1 - F.sigmoid(logits)) ** gamma * paddle.log(F.sigmoid(logits)) \
-(1-label) * (1 - alpha) * F.sigmoid(logits) ** gamma * paddle.log(1 - F.sigmoid(logits))
print(out)
Tensor(shape=[2, 3], dtype=float32, place=Place(cpu), stop_gradient=True,
[[0.00607155, 0.02089742, 0.13681224],
[0.04572807, 0.08544625, 0.37411609]])
带入 sigmoid_focal_loss 函数试一下,结果差不多
out = paddle.nn.functional.sigmoid_focal_loss(logit, label, reduction='none')
print(out)
Tensor(shape=[2, 3], dtype=float32, place=Place(cpu), stop_gradient=True,
[[0.00607155, 0.02089742, 0.13681222],
[0.04572806, 0.08544625, 0.37411612]])
class FocalLoss(nn.Layer):
"""A wrapper around paddle.nn.functional.sigmoid_focal_loss.
Args:
use_sigmoid (bool): currently only support use_sigmoid=True
alpha (float): parameter alpha in Focal Loss
gamma (float): parameter gamma in Focal Loss
loss_weight (float): final loss will be multiplied by this
"""
def __init__(self,
use_sigmoid=True,
alpha=0.25,
gamma=2.0,
loss_weight=1.0):
super(FocalLoss, self).__init__()
assert use_sigmoid == True, \
'Focal Loss only supports sigmoid at the moment'
self.use_sigmoid = use_sigmoid
self.alpha = alpha
self.gamma = gamma
self.loss_weight = loss_weight
def forward(self, pred, target, reduction='none'):
"""forward function.
Args:
pred (Tensor): logits of class prediction, of shape (N, num_classes)
target (Tensor): target class label, of shape (N, )
reduction (str): the way to reduce loss, one of (none, sum, mean)
"""
num_classes = pred.shape[1]
target = F.one_hot(target, num_classes+1).cast(pred.dtype)
target = target[:, :-1].detach() # 此处再把背景类删去
loss = F.sigmoid_focal_loss(
pred, target, alpha=self.alpha, gamma=self.gamma,
reduction=reduction)
return loss * self.loss_weight
粗看 PaddleDetection 只是简单的封装了一下 F.sigmoid_focal_loss
,实际上还是有个 trick 的,一下部分节选自 RetinaNet 的代码:
gt_class = gt_class.reshape([-1])
bg_class = paddle.to_tensor(
[self.num_classes], dtype=gt_class.dtype)
# a trick to assign num_classes to negative targets # 负例就是第 num_classes 类(从0开始)
gt_class = paddle.concat([gt_class, bg_class], axis=-1)
代码中把背景类写成第80(self.num_classes)
类(从0开始的,实际上是第81类),而送入 PaddleDetection 的 Focal loss 中,one_hot 输入中加了一类留给背景类
target = F.one_hot(target, num_classes+1).cast(pred.dtype)
之后又把背景类删去:
target = target[:, :-1].detach() # 此处再把背景类删去
这样做是真的妙,不必再整个 81 类的分类器,什么是背景类,不是前面的80类的类,就是背景类,于是背景类的one-hot label 就是
[0, 0, 0, 0, 0, 0, 0, 0, ......, 0] # 共80个0
妙啊!
normalizer 参数给 out 做标准化,也不一定为归一化,其实只时充当一个除数,支持广播机制
normalizer = paddle.to_tensor([12]).cast("float32")
out = paddle.nn.functional.sigmoid_focal_loss(logit, label, reduction='none', normalizer=normalizer)
print(out)
Tensor(shape=[2, 3], dtype=float32, place=Place(cpu), stop_gradient=True,
[[0.00050596, 0.00174145, 0.01140102],
[0.00381067, 0.00712052, 0.03117634]])
normalizer = paddle.to_tensor([12, 12, 12]).cast("float32")
out = paddle.nn.functional.sigmoid_focal_loss(logit, label, reduction='none', normalizer=normalizer)
print(out)
Tensor(shape=[2, 3], dtype=float32, place=Place(cpu), stop_gradient=True,
[[0.00050596, 0.00174145, 0.01140102],
[0.00381067, 0.00712052, 0.03117634]])
其只能是一维 tensor 也就是说长度必须和类数量一致