对于交叉熵,一直以来都是直接运用公式,最近稍微了解了一下
nn.CrossEntropyLoss结合nn.LogSoftmax和nn.NLLLoss(负对数似然),也就是先计算Softmax,再计算NLLLoss
import torch
# 手动计算
y = torch.tensor([1,0,0])
z = torch.tensor([0.2,0.1,-0.1])
y_pred = np.exp(z) / np.exp(z).sum()
loss = (-y * np.log(y_pred)).sum()
print(loss) # tensor(0.9729)
criterion = torch.nn.LogSoftmax()
z_tensor = torch.tensor([0.2, 0.1, -0.1])
z_tensor = criterion(z_tensor)
print(z_tensor) # tensor([-0.9729, -1.0729, -1.2729])
criterion = torch.nn.NLLLoss()
y_tensor = torch.LongTensor([0])
loss = criterion(z_tensor.reshape(1,3), y_tensor)
print(loss) # tensor(0.9729)
# 使用封装函数
criterion = torch.nn.CrossEntropyLoss()
y_tensor = torch.LongTensor([0])
z_tensor = torch.tensor([0.2, 0.1, -0.1]).reshape(1,3)
loss = criterion(z_tensor, y_tensor)
print(loss) # tensor(0.9729)
使用这个函数的时候有个坑,就是输入输出是什么样的。
参考官方文档CrossEntropyLoss — PyTorch 1.12 documentation
上述代码中对loss的计算对应了这种情况。有的资料说,输入labels维度应该为1维,且精度不能是Double,必须换成long代码。很好理解,类别只能是整数
z_tensor = torch.tensor([0.2, 0.1, -0.1]).reshape(1,3)
3代表的就是C。按照官方文档,input的形状应该可以是(C),那么这地方不用reshape也行,但实际上不reshape会报错:
Dimension out of range (expected to be in range of [-1, 0], but got 1)
没想明白。
当batch大于1时,注意target的形状应该是(N),而不是(N,1)
y = torch.LongTensor([0,1])
z = torch.Tensor([[0.2,0.1,-0.1],[4,5,6]])
print(y.shape, z.shape)
loss = criterion(z,y)
print(loss)
如果target写成下面这样:
y = torch.LongTensor([[0],[1]])
会报错:0D or 1D target tensor expected, multi-target not supported
对应的解释可以参考Pytorch学习笔记(5)——交叉熵报错RuntimeError: 1D target tensor expected, multi-target not supported_野指针小李的博客-CSDN博客_multi-target not supported
y_tensor = torch.Tensor([0.2, 0.3, 0.5]).reshape(1,3)
z_tensor = torch.tensor([0.2, 0.1, -0.1]).reshape(1,3)
loss = criterion(z_tensor, y_tensor)
print(loss)
这时候target就是关于分类的probabilities
官方文档说:target和input的形状要一样,所以以下代码会报错:
y_tensor = torch.Tensor([0.2, 0.3, 0.5]).reshape(1,3)
z_tensor = torch.tensor([[0.2, 0.1, 0.4],[0.3, 0.5, 0.6]])
loss = criterion(z_tensor, y_tensor)
print(loss)
即对于每一个样本,都要指定target的数值,而不能共用一个
官方给的案例是
# Example of target with class indices
loss = nn.CrossEntropyLoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.empty(3, dtype=torch.long).random_(5)
output = loss(input, target)
output.backward()
# Example of target with class probabilities
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5).softmax(dim=1)
output = loss(input, target)
output.backward()
官方推荐使用索引的形式:The performance of this criterion is generally better when target contains class indices, as this allows for optimized computation. Consider providing target as class probabilities only when a single class label per minibatch item is too restrictive.我的理解是,如果对于每个样本的结果,只提供一个标签不能满足需求,那就提供更为详细的probabilities