BUG pytorch

2020年11月21日

RuntimeError: grad can be implicitly created only for scalar outputs

意思是说:只有输出变量out是标量的时候,梯度才可以简单创建。错误之处在于:

x = torch.randn(3, requires_grad=True) # 定义该tensor需要求梯度
out = x * 2
out.backward() # 错误之处在此,当输出值是scalar时,backward括号中可以不传入参数

改正:

x = torch.randn(3, requires_grad=True) # 定义该tensor需要求梯度
out = x * 2
v = torch.tensor([1.0,1.0,1.0], dtype = torch.float)
out.backward(v) # backward中传入行向量,其长度等于向量行数
print(x.grad) # 这里会打印出out对x求导的结果

pytorch的自动求导机制中,会根据链式法则得到一个Jacobi矩阵,即用outx求导得到的矩阵,最后用backward中传入的向量v与求导向量进行矩阵乘法,将其变为向量。
疑问:

  1. 为什么自动传导需要最后的输出是scalar类型呢?作为scalar,其tensor的dim=0;然而以上代码处理之后的结果都是vector,其tensor的dim=1,所以说最后输出是vector也可以哦

2020年11月19日

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

今日份bug还是loss.backward()造成滴~
按照八哥的建议,single_loss.backward(retain_graph=True)然后出现了昨天的老八哥:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [2, 6]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

按照HuiYu-Li对该问题的描述,该问题出现的场景是有两个loss或者两个output需要backward的时候,先写backward的graph会被删除。因此需要在backward()中添加retain_graph=True来保存其参数。

然而,表面上看没有多个需要backward的变量,还是昨天的解决方法(因为写代码把那行注释掉了,所以又错了)。找到了一个很不错的博客,解决该问题:RamBoBai

detach/repackage the hidden state in between batches. There are (at least) three ways to do this.

hidden.detach_()
hidden = hidden.detach()
hidden = Variable(hidden.data, requires_grad=True) 

2020年11月18日

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1, 1]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

inplace参考知乎
经我的研究发现,我出现这个错误是模型维度定义错误导致的,使用了linear层,然而输入和输出维度都设置为1了,导致某个变量大小激增,然后挂了!

然后出现了这个问题:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (13x1 and 13x1)

意思是linear模型参数维度不匹配

import torch
import torch.nn as nn

class LinearRegression(nn.Module):
    def __init__(self):
        super(LinearRegression, self).__init__()
       	self.linear = nn.Linear(in_features=13, out_features=1) # params: 1 torch.Size([1, 13])
    def forward(self, x):
        return self.linear(x)
for i in range(len(list(model.parameters()))):  # 查看模型参数
    print('params: %d' % (i + 1), list(model.parameters())[i].size())

seq = torch.randn((12, 1))
memes = torch.randn(1)
print(seq.size(), memes.size())
input_seq = torch.cat([memes.view(-1, 1), seq]).reshape(1, -1)
print(model(input_seq))
"""
模型参数的size:
params: 1 torch.Size([1, 13])
params: 2 torch.Size([1])
所输入数据size:
torch.Size([12, 1]) torch.Size([1])
拼接成(13, 1)的数组,然后想经过linear层将维度13降到1,
在linear初始化时,第一个参数表示输入向量的特征数,在这里他会识别成1,因此需要转化输入向量的size
"""

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [13, 1]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True)

错误定位:loss.backward()
错误现象:每次都是执行3条样本就挂掉
经过分析:打印print(mems.requires_grad, seq.requires_grad,mems.is_leaf, seq.is_leaf),得到三条打印信息,前提是seq是没错的,而mems和seq在第二次之后的结果都相反。因此,训练中用到的两个变量,其中不出错的变量seq.requires_grad = False,而会出错的是True,欲将其强制赋值Falsemems.requires_grad=False,出现以下错误。

RuntimeError: you can only change requires_grad flags of leaf variables. If you want to use a computed variable in a subgraph that doesn’t require differentiation use var_no_grad = var.detach().

然后就知道怎么改了,不想计算梯度,就在训练前加上一句mems = mems.detach()
虽然本人还没有搞懂原理,来日方长…


转自知乎

UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).

solve:

#w = torch.tensor(np.random.normal(0, 0.01, (num_inputs, 1)), dtype=torch.float32)
w = torch.as_tensor(np.random.normal(0, 0.01, (num_inputs, 1)), dtype=torch.float32)
b = torch.zeros(1, dtype=torch.float32)

你可能感兴趣的:(异常,python,pytorch,bug)