RuntimeError: CUDA error: device-side assert triggered

 模型训练着突然报错,没找到问题在哪,把这个错简单复现了一下,网上都说是torch.nn.functional的cross_entropy的input和target上出了问题,target的index不对什么的[1,2],如下:target是[1,2,3]和[2,3,0]都没问题,[1,2,100]和[99,100,101]就不对,还没搞明白,先记一下。

[1]https://blog.csdn.net/littlehaes/article/details/102806323

[2]https://blog.csdn.net/qq_27292549/article/details/81084782

[3]多分类交叉熵的计算:https://www.cnblogs.com/jclian91/p/9376117.html

[4]target的label出现非法值:https://www.cnblogs.com/geoffreyone/p/10653619.html

>>> import torch
>>> import torch.nn.functional.F as F

>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.randint(5, (3,), dtype=torch.int64)
>>> input
tensor([[-1.5579,  0.5080,  0.1069,  0.7945,  0.4689],
        [-2.9727,  0.3491, -1.2172, -0.0223,  1.2733],
        [-0.2269, -1.1830, -0.8604,  1.2835,  1.2629]], requires_grad=True)
>>> target
tensor([2, 3, 0])
>>> loss = F.cross_entropy(input, target)
>>> loss
tensor(2.0206, grad_fn=)

>>> input2 = torch.randn(4, 5, requires_grad=True)
>>> lossloss = F.cross_entropy(input, target)
>>> loss=F.cross_entropy(input2, target)
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/cmz/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 2056, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "/home/cmz/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1869, in nll_loss
    .format(input.size(0), target.size(0)))
ValueError: Expected input batch_size (4) to match target batch_size (3).

>>> import numpy as np
>>> target
tensor([2, 3, 0])
>>> t=np.array([99,100,101])
>>> tt=torch.from_numpy(t)
>>> tt
tensor([ 99, 100, 101])
>>> loss = F.cross_entropy(input, tt)
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/cmz/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 2056, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "/home/cmz/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1871, in nll_loss
    ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed.  at ../aten/src/THNN/generic/ClassNLLCriterion.c:92
'''
>>> t=np.array([1,2,3])
>>> ttt=torch.from_numpy(t)
>>> ttt
tensor([1, 2, 3])
>>> loss = F.cross_entropy(input, ttt)
>>> t=np.array([1,2,100])
>>> ttt=torch.from_numpy(t)
>>> loss = F.cross_entropy(input, ttt)
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/cmz/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 2056, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "/home/cmz/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1871, in nll_loss
    ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed.  at ../aten/src/THNN/generic/ClassNLLCriterion.c:92

从断掉的checkpoint又重新载入训练,没再出这个错了,迷。

你可能感兴趣的:(PyTorch)