2020年11月21日
RuntimeError: grad can be implicitly created only for scalar outputs
意思是说:只有输出变量out
是标量的时候,梯度才可以简单创建。错误之处在于:
x = torch.randn(3, requires_grad=True) # 定义该tensor需要求梯度
out = x * 2
out.backward() # 错误之处在此,当输出值是scalar时,backward括号中可以不传入参数
改正:
x = torch.randn(3, requires_grad=True) # 定义该tensor需要求梯度
out = x * 2
v = torch.tensor([1.0,1.0,1.0], dtype = torch.float)
out.backward(v) # backward中传入行向量,其长度等于向量行数
print(x.grad) # 这里会打印出out对x求导的结果
pytorch的自动求导机制中,会根据链式法则得到一个Jacobi矩阵,即用out
对x
求导得到的矩阵,最后用backward中传入的向量v
与求导向量进行矩阵乘法,将其变为向量。
疑问:
2020年11月19日
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
今日份bug还是loss.backward()
造成滴~
按照八哥的建议,single_loss.backward(retain_graph=True)
然后出现了昨天的老八哥:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [2, 6]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
按照HuiYu-Li对该问题的描述,该问题出现的场景是有两个loss
或者两个output
需要backward
的时候,先写backward
的graph会被删除。因此需要在backward()
中添加retain_graph=True
来保存其参数。
然而,表面上看没有多个需要backward的变量,还是昨天的解决方法(因为写代码把那行注释掉了,所以又错了)。找到了一个很不错的博客,解决该问题:RamBoBai
detach/repackage the hidden state in between batches. There are (at least) three ways to do this.
hidden.detach_()
hidden = hidden.detach()
hidden = Variable(hidden.data, requires_grad=True)
2020年11月18日
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1, 1]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
inplace
参考知乎
经我的研究发现,我出现这个错误是模型维度定义错误导致的,使用了linear层,然而输入和输出维度都设置为1了,导致某个变量大小激增,然后挂了!
然后出现了这个问题:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (13x1 and 13x1)
意思是linear模型参数维度不匹配
import torch
import torch.nn as nn
class LinearRegression(nn.Module):
def __init__(self):
super(LinearRegression, self).__init__()
self.linear = nn.Linear(in_features=13, out_features=1) # params: 1 torch.Size([1, 13])
def forward(self, x):
return self.linear(x)
for i in range(len(list(model.parameters()))): # 查看模型参数
print('params: %d' % (i + 1), list(model.parameters())[i].size())
seq = torch.randn((12, 1))
memes = torch.randn(1)
print(seq.size(), memes.size())
input_seq = torch.cat([memes.view(-1, 1), seq]).reshape(1, -1)
print(model(input_seq))
"""
模型参数的size:
params: 1 torch.Size([1, 13])
params: 2 torch.Size([1])
所输入数据size:
torch.Size([12, 1]) torch.Size([1])
拼接成(13, 1)的数组,然后想经过linear层将维度13降到1,
在linear初始化时,第一个参数表示输入向量的特征数,在这里他会识别成1,因此需要转化输入向量的size
"""
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [13, 1]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True)
错误定位:loss.backward()
错误现象:每次都是执行3条样本就挂掉
经过分析:打印print(mems.requires_grad, seq.requires_grad,mems.is_leaf, seq.is_leaf)
,得到三条打印信息,前提是seq是没错的,而mems和seq在第二次之后的结果都相反。因此,训练中用到的两个变量,其中不出错的变量seq.requires_grad = False
,而会出错的是True,欲将其强制赋值Falsemems.requires_grad=False
,出现以下错误。
RuntimeError: you can only change requires_grad flags of leaf variables. If you want to use a computed variable in a subgraph that doesn’t require differentiation use var_no_grad = var.detach().
然后就知道怎么改了,不想计算梯度,就在训练前加上一句mems = mems.detach()
虽然本人还没有搞懂原理,来日方长…
转自知乎
UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
solve:
#w = torch.tensor(np.random.normal(0, 0.01, (num_inputs, 1)), dtype=torch.float32)
w = torch.as_tensor(np.random.normal(0, 0.01, (num_inputs, 1)), dtype=torch.float32)
b = torch.zeros(1, dtype=torch.float32)