CrossEntropyLoss的理解

对于交叉熵,一直以来都是直接运用公式,最近稍微了解了一下

1 公式

nn.CrossEntropyLoss结合nn.LogSoftmax和nn.NLLLoss(负对数似然),也就是先计算Softmax,再计算NLLLoss

 手动计算

import torch
# 手动计算
y = torch.tensor([1,0,0])
z = torch.tensor([0.2,0.1,-0.1])
y_pred = np.exp(z) / np.exp(z).sum()
loss = (-y * np.log(y_pred)).sum()
print(loss) # tensor(0.9729)
  

使用nn.LogSoftmax和nn.NLLLoss

criterion = torch.nn.LogSoftmax()   
z_tensor = torch.tensor([0.2, 0.1, -0.1])
z_tensor = criterion(z_tensor)
print(z_tensor) # tensor([-0.9729, -1.0729, -1.2729])

criterion = torch.nn.NLLLoss()   
y_tensor = torch.LongTensor([0])
loss = criterion(z_tensor.reshape(1,3), y_tensor)
print(loss)  # tensor(0.9729)

使用CrossEntropyLoss函数

# 使用封装函数
criterion = torch.nn.CrossEntropyLoss()   
y_tensor = torch.LongTensor([0])
z_tensor = torch.tensor([0.2, 0.1, -0.1]).reshape(1,3)
loss = criterion(z_tensor, y_tensor)
print(loss) # tensor(0.9729)

2 input和target

使用这个函数的时候有个坑,就是输入输出是什么样的。

参考官方文档CrossEntropyLoss — PyTorch 1.12 documentation

CrossEntropyLoss的理解_第1张图片

(1)输入是(N,C) ,target是类别索引

上述代码中对loss的计算对应了这种情况。有的资料说,输入labels维度应该为1维,且精度不能是Double,必须换成long代码。很好理解,类别只能是整数

z_tensor = torch.tensor([0.2, 0.1, -0.1]).reshape(1,3)

3代表的就是C。按照官方文档,input的形状应该可以是(C),那么这地方不用reshape也行,但实际上不reshape会报错:

Dimension out of range (expected to be in range of [-1, 0], but got 1)

没想明白。

当batch大于1时,注意target的形状应该是(N),而不是(N,1)

y = torch.LongTensor([0,1])
z = torch.Tensor([[0.2,0.1,-0.1],[4,5,6]])
print(y.shape, z.shape)
loss = criterion(z,y)
print(loss)

如果target写成下面这样:

y = torch.LongTensor([[0],[1]])

会报错:0D or 1D target tensor expected, multi-target not supported

对应的解释可以参考Pytorch学习笔记(5)——交叉熵报错RuntimeError: 1D target tensor expected, multi-target not supported_野指针小李的博客-CSDN博客_multi-target not supported

(2)输入是输入是(N,C) ,target是类别可能性probabilities

y_tensor = torch.Tensor([0.2, 0.3, 0.5]).reshape(1,3)
z_tensor = torch.tensor([0.2, 0.1, -0.1]).reshape(1,3)
loss = criterion(z_tensor, y_tensor)
print(loss)

这时候target就是关于分类的probabilities

官方文档说:target和input的形状要一样,所以以下代码会报错:

y_tensor = torch.Tensor([0.2, 0.3, 0.5]).reshape(1,3)
z_tensor = torch.tensor([[0.2, 0.1, 0.4],[0.3, 0.5, 0.6]])
loss = criterion(z_tensor, y_tensor)
print(loss)

即对于每一个样本,都要指定target的数值,而不能共用一个

官方给的案例是

# Example of target with class indices
loss = nn.CrossEntropyLoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.empty(3, dtype=torch.long).random_(5)
output = loss(input, target)
output.backward()
# Example of target with class probabilities
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5).softmax(dim=1)
output = loss(input, target)
output.backward()

(3)选择

官方推荐使用索引的形式:The performance of this criterion is generally better when target contains class indices, as this allows for optimized computation. Consider providing target as class probabilities only when a single class label per minibatch item is too restrictive.我的理解是,如果对于每个样本的结果,只提供一个标签不能满足需求,那就提供更为详细的probabilities

你可能感兴趣的:(深度学习,pytorch,python)