模型训练着突然报错,没找到问题在哪,把这个错简单复现了一下,网上都说是torch.nn.functional的cross_entropy的input和target上出了问题,target的index不对什么的[1,2],如下:target是[1,2,3]和[2,3,0]都没问题,[1,2,100]和[99,100,101]就不对,还没搞明白,先记一下。
[1]https://blog.csdn.net/littlehaes/article/details/102806323
[2]https://blog.csdn.net/qq_27292549/article/details/81084782
[3]多分类交叉熵的计算:https://www.cnblogs.com/jclian91/p/9376117.html
[4]target的label出现非法值:https://www.cnblogs.com/geoffreyone/p/10653619.html
>>> import torch
>>> import torch.nn.functional.F as F
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.randint(5, (3,), dtype=torch.int64)
>>> input
tensor([[-1.5579, 0.5080, 0.1069, 0.7945, 0.4689],
[-2.9727, 0.3491, -1.2172, -0.0223, 1.2733],
[-0.2269, -1.1830, -0.8604, 1.2835, 1.2629]], requires_grad=True)
>>> target
tensor([2, 3, 0])
>>> loss = F.cross_entropy(input, target)
>>> loss
tensor(2.0206, grad_fn=)
>>> input2 = torch.randn(4, 5, requires_grad=True)
>>> lossloss = F.cross_entropy(input, target)
>>> loss=F.cross_entropy(input2, target)
Traceback (most recent call last):
File "", line 1, in
File "/home/cmz/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 2056, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/home/cmz/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1869, in nll_loss
.format(input.size(0), target.size(0)))
ValueError: Expected input batch_size (4) to match target batch_size (3).
>>> import numpy as np
>>> target
tensor([2, 3, 0])
>>> t=np.array([99,100,101])
>>> tt=torch.from_numpy(t)
>>> tt
tensor([ 99, 100, 101])
>>> loss = F.cross_entropy(input, tt)
Traceback (most recent call last):
File "", line 1, in
File "/home/cmz/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 2056, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/home/cmz/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1871, in nll_loss
ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed. at ../aten/src/THNN/generic/ClassNLLCriterion.c:92
'''
>>> t=np.array([1,2,3])
>>> ttt=torch.from_numpy(t)
>>> ttt
tensor([1, 2, 3])
>>> loss = F.cross_entropy(input, ttt)
>>> t=np.array([1,2,100])
>>> ttt=torch.from_numpy(t)
>>> loss = F.cross_entropy(input, ttt)
Traceback (most recent call last):
File "", line 1, in
File "/home/cmz/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 2056, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/home/cmz/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1871, in nll_loss
ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed. at ../aten/src/THNN/generic/ClassNLLCriterion.c:92
从断掉的checkpoint又重新载入训练,没再出这个错了,迷。