mean absolute error
默认reduction参数为mean,是对所有的元素进行平均。
默认reduction参数为mean。除了使用MSE计算loss,其他的与L1Loss一样。
negative log likelihood loss,用于多分类问题
在计算时,对于每个batch的 ln,只使用xn中与yn相对应的元素进行计算。减小loss以为这相对于元素增大,由于是经过softmax层的,代表其他类的元素值下降。
结合了LogSoftmax和NLLLoss
输入和NLLLoss一样,参数也相同。
import torch.nn as nn
import torch
m = nn.Sigmoid()
input = torch.randn(4, 5, requires_grad=True)
target = torch.empty(4, 5).random_(2)
input = m(input)
loss1 = nn.BCELoss(reduction='none')
output1 = loss1(input, target)
print("output1:\n", output1)
#w2 = torch.empty(4).random_(2)
#loss2 = nn.BCELoss(reduction='none', weight=w2)
#output2 = loss2(input, target)
#print(w2, output2)
w3 = torch.empty(4, 1).random_(2)
loss3 = nn.BCELoss(reduction='none', weight=w3)
output3 = loss3(input, target)
print("w3:\n", w3, "\noutput3:\n", output3)
w4 = torch.empty(5).random_(2)
loss4 = nn.BCELoss(reduction='none', weight=w4)
output4 = loss4(input, target)
print("w4:\n", w4, "\noutput4:\n", output4)
w5 = torch.empty(1, 5).random_(2)
loss5 = nn.BCELoss(reduction='none', weight=w5)
output5 = loss5(input, target)
print("w5:\n", w5, "\noutput5:\n", output5)
w6 = torch.empty(4, 5).random_(2)
loss6 = nn.BCELoss(reduction='none', weight=w6)
output6 = loss6(input, target)
print("w6:\n", w6, "\noutput6:\n", output6)
结果:
output1:
tensor([[1.2874, 1.0905, 2.1070, 0.3826, 2.0119],
[0.3760, 0.3198, 1.0668, 0.8324, 1.1959],
[1.1589, 0.7380, 0.7686, 0.7132, 0.5279],
[1.0708, 1.2377, 0.3109, 0.8353, 0.4442]],
grad_fn=<BinaryCrossEntropyBackward>)
w3:
tensor([[0.],
[0.],
[1.],
[1.]])
output3:
tensor([[0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[1.1589, 0.7380, 0.7686, 0.7132, 0.5279],
[1.0708, 1.2377, 0.3109, 0.8353, 0.4442]],
grad_fn=<BinaryCrossEntropyBackward>)
w4:
tensor([1., 1., 0., 0., 0.])
output4:
tensor([[1.2874, 1.0905, 0.0000, 0.0000, 0.0000],
[0.3760, 0.3198, 0.0000, 0.0000, 0.0000],
[1.1589, 0.7380, 0.0000, 0.0000, 0.0000],
[1.0708, 1.2377, 0.0000, 0.0000, 0.0000]],
grad_fn=<BinaryCrossEntropyBackward>)
w5:
tensor([[1., 1., 1., 0., 1.]])
output5:
tensor([[1.2874, 1.0905, 2.1070, 0.0000, 2.0119],
[0.3760, 0.3198, 1.0668, 0.0000, 1.1959],
[1.1589, 0.7380, 0.7686, 0.0000, 0.5279],
[1.0708, 1.2377, 0.3109, 0.0000, 0.4442]],
grad_fn=<BinaryCrossEntropyBackward>)
w6:
tensor([[0., 1., 1., 1., 0.],
[1., 0., 0., 1., 0.],
[0., 1., 0., 1., 0.],
[0., 1., 1., 0., 1.]])
output6:
tensor([[0.0000, 1.0905, 2.1070, 0.3826, 0.0000],
[0.3760, 0.0000, 0.0000, 0.8324, 0.0000],
[0.0000, 0.7380, 0.0000, 0.7132, 0.0000],
[0.0000, 1.2377, 0.3109, 0.0000, 0.4442]],
grad_fn=<BinaryCrossEntropyBackward>)
在代码中被注释的部分报错,结果说明,BECLoss的weight是可以对每个元素设置的。
将sigmoid与BCELoss结合在一起,比两者单独使用更具有数值稳定性。其他一样
计算方式有差别外,其他与L1Loss没差别。
如果元素级的error低于1,则使用平方,否则使用L1。它对异常值的敏感度低于MSELoss,并且在某些情况下可以防止梯度爆炸。