今天在训练模型的时候出现了状况,不同的参数经过模型之后输出的预测数值相同。
model.eval()
output = model(torch.tensor([[ 101, 403, 2033, 2011, 2151, 1003, 2017, 1005, 1040, 102],
[ 101, 102 , 103, 104 , 105 , 106 , 107 , 108 , 109 , 112]]),
torch.tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]),
torch.tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]))
print(output)
...outputs = ...
tensor([[[ 9.9717, -0.1572, 0.3907, ..., 0.0436, -36.5632, -0.1561],
[ 9.9717, -0.1572, 0.3907, ..., 0.0436, -36.5632, -0.1561],
[ 9.9717, -0.1572, 0.3907, ..., 0.0436, -36.5632, -0.1561],
...,
[ 9.9717, -0.1572, 0.3907, ..., 0.0436, -36.5632, -0.1561],
[ 9.9717, -0.1572, 0.3907, ..., 0.0436, -36.5632, -0.1561],
[ 9.9717, -0.1572, 0.3907, ..., 0.0436, -36.5632, -0.1561]],
[[ 9.9717, -0.1572, 0.3907, ..., 0.0436, -36.5632, -0.1561],
[ 9.9717, -0.1572, 0.3907, ..., 0.0436, -36.5632, -0.1561],
[ 9.9717, -0.1572, 0.3907, ..., 0.0436, -36.5632, -0.1561],
...,
[ 9.9717, -0.1572, 0.3907, ..., 0.0436, -36.5632, -0.1561],
[ 9.9717, -0.1572, 0.3907, ..., 0.0436, -36.5632, -0.1561],
[ 9.9717, -0.1572, 0.3907, ..., 0.0436, -36.5632, -0.1561]]],
grad_fn=)
可以看出,虽然id不同,但是最终输出的矩阵outputs的内容相同,此时经过模型的训练之后,输出的预测结果仍然相同
output =
tensor([[-0.3238, -0.1439, 0.6271, ..., 0.2854, 0.5228, -1.2008],
[-0.3238, -0.1439, 0.6271, ..., 0.2854, 0.5228, -1.2008],
[-0.3238, -0.1439, 0.6271, ..., 0.2854, 0.5228, -1.2008],
...,
[-0.3238, -0.1439, 0.6271, ..., 0.2854, 0.5228, -1.2008],
[-0.3237, -0.1440, 0.6271, ..., 0.2854, 0.5228, -1.2008],
[-0.3238, -0.1439, 0.6271, ..., 0.2854, 0.5228, -1.2008]],
device='cuda:0')
pred =
tensor([152, 152, 152, 152, 152, 152, 152, 152, 152, 152, 152, 152, 152, 152,
152, 152, 152, 152, 152, 152, 152, 152, 152, 152, 152, 152, 152, 152,
152, 152, 152, 152, 152, 152, 152, 152, 152, 152, 152, 152, 152, 152,
152, 152, 152, 152, 152, 152, 152], device='cuda:0')
因为如果不同标签输出的参数是一样的话,比如id=5的时候输出的参数和id=6的时候输出的参数是一样的,此时id=5的crossentropyloss = 某个值的时候,当你反向更新参数的时候,id=6的crossentropyloss也会同步的更新,并且由于它两输出的值一样,所以更新的梯度一样的时候,更新完参数之后id = 6的输出结果仍然与id = 5的输出结果是一样的。
因此如果输出的参数是一样的情况下,基本上只能通过中间的dropout苟延残喘,不可能达到id=5的时候向标签1的方向更新,id=6的时候向标签2的方向更新的情形(输出的标签一样,但是输出的参数不一样的时候,仍然可以进行更新)。
这里id=5与id=6都输出同一个参数,可以视为id=5和id=6经过的是同一套函数,而如果id=5与id=6输出不同参数的时候,可以视为id=5和id=6经过的是不同的函数,自然训练的时候,id=5和id=6就会向不同的标签走去