当torch.nn.L1Loss的参数reduction选择’sum’时即为L1 loss;
当选择 ‘mean’ 或’none’时,即为MAE。
公式如下:
M A E = 1 n ∗ ∑ i = 1 n ∣ y i − y i p ∣ MAE = \frac{1}{n} * \sum\limits_{i = 1}^n {\left| {{y_i} - y_i^p} \right|} MAE=n1∗i=1∑n∣yi−yip∣
L 1 = ∑ i = 1 n ∣ y i − y i p ∣ L1 = \sum\limits_{i = 1}^n {\left| {{y_i} - y_i^p} \right|} L1=i=1∑n∣yi−yip∣
coding小栗子如下:
import torch
loss = torch.nn.L1Loss(reduction='sum')
pred=torch.tensor([[1.1,2.2],[3.3,4.4]],dtype=torch.float)
target = torch.tensor([[1,2],[3,4]],dtype=torch.float)
output = loss(pred, target)
print(output)
输出结果:
tensor(1.0000) #即0.1+0.2+0.3+0.4=1.0;若选择“mean”,则结果为1.0/4=0.25
L2 Loss、均方误差(MSE)与L1 Loss、平均绝对误差(MAE)类似,只不过它采用了预测值与目标值差值平方和的形式。在torch.nn.MSELoss()函数中 参数reduction同样有三个可选值:‘none’ , ‘mean’,‘sum’。
公式如下:
L 2 = 1 n ∗ ∑ i = 1 n ( y i − y i p ) 2 L2 = \frac{1}{n} * \sum\limits_{i = 1}^n {{{({y_i} - y_i^p)}^2}} L2=n1∗i=1∑n(yi−yip)2
L 2 = ∑ i = 1 n ( y i − y i p ) 2 L2 = \sum\limits_{i = 1}^n {{{({y_i} - y_i^p)}^2}} L2=i=1∑n(yi−yip)2
coding小栗子如下:
import torch
loss = torch.nn.MSELoss(reduction='sum')
pred=torch.tensor([[1.1,2.2],[3.3,4.4]],dtype=torch.float)
target = torch.tensor([[1,2],[3,4]],dtype=torch.float)
output = loss(pred, target)
print(output)
输出结果为:
tensor(0.3000) #0.01+0.04+0.09+0.16= 0.3
该函数是一个分段函数,在[-1,1]之间采用L2 Loss,其他区间采用L1 Loss。这样,既解决了L1 loss在0点处不可导,曲线不光滑,又解决了L2 Loss梯度爆炸的问题。
fast rcnn论文中提及, 该loss“ is less sensitive to outliers than the L2 loss”。
其公式为:
S m o o t h L 1 L o s s = 1 n ∑ i = 1 n Z i SmoothL1Loss = \frac{1}{n}\sum\limits_{i = 1}^n {{Z_i}} SmoothL1Loss=n1i=1∑nZi
i f ∣ x i − y i ∣ < 1 {if\left| {{x_i} - {y_i}} \right| < 1} if∣xi−yi∣<1时, Z i = 0.5 ( x i − y i ) 2 {{Z_i} = 0.5{{({x_i} - {y_i})}^2}} Zi=0.5(xi−yi)2
otherwise, Z i = ∣ x i − y i ∣ − 0.5 {{Z_i} = \left| {{x_i} - {y_i}} \right| - 0.5} Zi=∣xi−yi∣−0.5
coding小栗子如下:
import torch
loss = torch.nn.SmoothL1Loss(reduction='sum')
pred=torch.tensor([[1.1,2.2],[3.3,4.4]],dtype=torch.float)
target = torch.tensor([[1,1],[3,6]],dtype=torch.float)
output = loss(pred, target)
print(output)
结果为:
tensor(0.4625)#(0.5*0.1**2+1.2-0.5+0.5*0.3**2+1.6-0.5)/4=0.4625
以上三种Loss函数曲线如下:
上图参考了https://www.cnblogs.com/wangguchangqing/p/12021638.html)
本文中没有特别说明,loss都采用默认的mean模式。
二者公式如下:
BCELoss:
B C E L o s s = 1 n ∑ i = 1 n l i BCELoss = \frac{1}{n}\sum\limits_{i = 1}^n {{l_i}} BCELoss=n1i=1∑nli
l i = − [ y i ⋅ log x i + ( 1 − y i ) ⋅ log ( 1 − x i ) {l_i} = - [{y_{i}} \cdot \log {x_i} + (1 - {y_i}) \cdot \log (1 - {x_i}) li=−[yi⋅logxi+(1−yi)⋅log(1−xi)
直接使用BCELoss时注意, x i {x_i} xi需为0~1之间 。否则会触发以下错误:
RuntimeError: Assertion `x >= 0. && x <= 1.’ failed. input value should be between 0~1, but got -0.615788…
因此在BCELoss()对预测结果进行sigmoid()将其限制在(0,1)是一个不错的操作。
即为BCEWithLogitsLoss()。
BCEWithLogitsLoss:
B C E W i t h L o g i t s L o s s = 1 n ∑ i = 1 n l i BCEWithLogitsLoss= \frac{1}{n}\sum\limits_{i = 1}^n {{l_i}} BCEWithLogitsLoss=n1i=1∑nli
l i = − [ y i ⋅ log σ ( x i ) + ( 1 − y i ) ⋅ log ( 1 − σ ( x i ) ) ] {l_i} = - [{y_{i}} \cdot \log \sigma ({x_i}) + (1 - {y_i}) \cdot \log (1 - \sigma ({x_i}))] li=−[yi⋅logσ(xi)+(1−yi)⋅log(1−σ(xi))],
其中 σ ( x ) \sigma (x) σ(x)为sigmoid()函数。
对于多分类求loss(),可以把预测结果转为(全连接*one_hot类型分类结果),再求BCEWithLogitsLoss()。
BCELoss、BCEWithLogitsLoss及个人coding实现如下:
import torch.nn as nn
input = torch.randn((2,2))
target = torch.empty((2,2)).random_(2)
#input=torch.tensor([[ 2.4480, 0.3336],
# [-0.8614, -1.2634]])
#target =torch.tensor([[1., 1.],
# [0., 0.]])
sigmoid = nn.Sigmoid()
loss = nn.BCELoss()
sigmoid_input=sigmoid(input)
#采用torch.nn.BCELoss()实现
print(loss(sigmoid_input, target))
#自己编程实现
res=torch.sum((target*log(sigmoid_input)+(1-target)*log(1-sigmoid_input)))
#res=0
#[m,n]=sigmoid_input.shape
#for i in range(m):
# for j in range(n):
# res+=-1*(target[i,j]*log(sigmoid_input[i,j])+(1-target[i,j])*log(1-sigmoid_input[i,j]))
print(-res/(m*n))
#直接采用torch.nn.BCEWithLogitsLoss()实现
loss = nn.BCEWithLogitsLoss()
print(loss(input, target))
结果为:
tensor(0.5947)
tensor(0.5947)
tensor(0.5947)
此部分参考了https://blog.csdn.net/qq_22210253/article/details/85222093
与BCELoss和BCEWithLogitsLoss关系类似,nn.CrossEntropyLoss()可以看做是nn.LogSoftmax() 与nn.NLLLoss() 二者先后作用的效果合成。
这里我们看到:
pred数据的格式 b a t c h _ s i z e ∗ c h a n n e l ∗ h e i g h t ∗ w i d t h {\rm{batch\_size*channel*height*width}} batch_size∗channel∗height∗width
channe是数据集的类别数,如VOC数据集,加上背景为21类,channel就是21。
而label数据的格式为: b a t c h _ s i z e ∗ h e i g h t ∗ w i d t h {\rm{batch\_size*height*width}} batch_size∗height∗width。
这一点要和其他loss区分下。
公式:
LogSoftmax ( x i ) = log ( exp ( x i ) ∑ j exp ( x j ) ) \text{LogSoftmax}(x_{i}) = \log\left(\frac{\exp(x_i) }{ \sum_j \exp(x_j)} \right) LogSoftmax(xi)=log(∑jexp(xj)exp(xi))
值得说明的是, nn.LogSoftmax是在dim=1进行,即channel/class维度,
对应程序:log_soft = nn.LogSoftmax(dim=1)
N L L L o s s = − 1 n ∗ ∑ i = 1 n X i , Y i NLLLoss = - \frac{1}{n} * \sum\limits_{i = 1}^n {{X_{i,{Y_i}}}} NLLLoss=−n1∗i=1∑nXi,Yi
torch官方网站列的公式比较复杂,这里就不列了。个人认为结合下面例子深入理解下其实现过程即可。
nn.NLLLoss() ,cv_dreamer个人编程及nn.CrossEntropyLoss()三种方式实现的例子如下:
import torch
import torch.nn as nn
x = torch.Tensor([[[1, 2, 1],
[2, 2, 1],
[0, 1, 1]],
[[0, 1, 3],
[2, 3, 1],
[0, 0, 1]]])
x = x.view([1, 2, 3, 3])
#torch.nn.NLLLoss()实现
log_soft = nn.LogSoftmax(dim=1)
x1=log_soft(x)
#x1:tensor([[[[-0.3133, -0.3133, -2.1269],
# [-0.6931, -1.3133, -0.6931],
# [-0.6931, -0.3133, -0.6931]],
#
# [[-1.3133, -1.3133, -0.1269],
# [-0.6931, -0.3133, -0.6931],
# [-0.6931, -1.3133, -0.6931]]]])
y = torch.LongTensor([[1, 0, 1],
[0, 0, 1],
[1, 1, 1]])
y = y.view([1, 3, 3])
loss = nn.NLLLoss()
print(loss(x1,y))
#dreamer编程实现
mat=torch.zeros(3,3)
for i in range(3):
for j in range(3):
mat[i,j]=-x1[0,int(y[0,i,j]),i,j]
#mat:
#tensor([[1.3133, 0.3133, 0.1269],
# [0.6931, 1.3133, 0.6931],
# [0.6931, 1.3133, 0.6931]])
print(torch.sum(mat)/(3*3))
#torch.nn.CrossEntropyLoss()实现
loss = nn.CrossEntropyLoss()
print(loss(x,y))
结果为:
tensor(0.7947)
tensor(0.7947)
tensor(0.7947)
此部分参考了https://blog.csdn.net/zhaowangbo/article/details/88821017。
好的。最为基础的几个loss就介绍到这里了。
接下来,CV_Dreamer同学将继续总结其余loss,让我们一起去领略loss的神奇吧。
https://pytorch.org/docs/stable/nn.html#loss-functions
https://www.cnblogs.com/wangguchangqing/p/12021638.html
https://blog.csdn.net/qq_22210253/article/details/85222093
https://blog.csdn.net/zhaowangbo/article/details/88821017