i.e.,
for param in Net().parameters():
print(param.grad)
>>>None
总结一下两种造成梯度为None的原因:
如下面这段代码:
import torch
print("Trial 1: with python float")
w = torch.randn(3,5,requires_grad = True) * 0.01
x = torch.randn(5,4,requires_grad = True)
y = torch.matmul(w,x).sum(1)
y.backward(torch.ones(3))
print("w.requires_grad:",w.requires_grad)
print("x.requires_grad:",x.requires_grad)
print("w.grad",w.grad)
print("x.grad",x.grad)
print("Trial 2: with on-the-go torch scalar")
w = torch.randn(3,5,requires_grad = True) * torch.tensor(0.01,requires_grad=True)
x = torch.randn(5,4,requires_grad = True)
y = torch.matmul(w,x).sum(1)
y.backward(torch.ones(3))
print("w.requires_grad:",w.requires_grad)
print("x.requires_grad:",x.requires_grad)
print("w.grad",w.grad)
print("x.grad",x.grad)
print("Trial 3: with named torch scalar")
t = torch.tensor(0.01,requires_grad=True)
w = torch.randn(3,5,requires_grad = True) * t
x = torch.randn(5,4,requires_grad = True)
y = torch.matmul(w,x).sum(1)
y.backward(torch.ones(3))
print("w.requires_grad:",w.requires_grad)
print("x.requires_grad:",x.requires_grad)
print("w.grad",w.grad)
print("x.grad",x.grad)
返回值如下:
>>>print("Trial 1: with python float")
>>>w.requires_grad: True
>>>x.requires_grad: True
>>>w.grad None
>>>x.grad tensor([[-0.0238, -0.0238, -0.0238, -0.0238],
[ 0.0033, 0.0033, 0.0033, 0.0033],
[ 0.0302, 0.0302, 0.0302, 0.0302],
[-0.0024, -0.0024, -0.0024, -0.0024],
[-0.0023, -0.0023, -0.0023, -0.0023]])
>>>Trial 2: with on-the-go torch scalar
>>>w.requires_grad: True
>>>x.requires_grad: True
>>>w.grad None
>>>x.grad tensor([[-0.0171, -0.0171, -0.0171, -0.0171],
[ 0.0017, 0.0017, 0.0017, 0.0017],
[-0.0003, -0.0003, -0.0003, -0.0003],
[-0.0162, -0.0162, -0.0162, -0.0162],
[ 0.0227, 0.0227, 0.0227, 0.0227]])
>>>Trial 3: with named torch scalar
>>>w.requires_grad: True
>>>x.requires_grad: True
>>>w.grad None
>>>x.grad tensor([[ 0.0154, 0.0154, 0.0154, 0.0154],
[-0.0095, -0.0095, -0.0095, -0.0095],
[ 0.0076, 0.0076, 0.0076, 0.0076],
[ 0.0164, 0.0164, 0.0164, 0.0164],
[-0.0345, -0.0345, -0.0345, -0.0345]])
变量w为中间变量,所以在求梯度时,w的梯度为None。
解决方式是在.backward
之前添加w(中间变量).retain_grad()
...
w.retain_grad()
y.backward()
...
代码如下:
import torch
import torch.nn as nn
import torch.optim as optim
class Test(nn.Module):
def __init__(self):
super(Test, self).__init__()
self.fc1 = nn.Linear(4, 4)
self.fc2 = nn.Linear(4, 1)
self.fc3 = nn.Linear(4, 4) # 多余的FC
def forward(self, x):
x = self.fc1(x)
x = self.fc2(x)
return x
if __name__ == "__main__":
test = Test()
x = torch.randn(2, 1, 4) # data
y = torch.randn(2, 1, 1) # label
print("======网络包含的参数有:{}========".format(list(test.parameters())))
for param in test.parameters():
print("+++++++++训练前网络参数的梯度为{}++++++++++".format(param.grad))
criterion = nn.MSELoss()
optimizer = optim.Adam(Test().parameters(), lr=0.0001, weight_decay=0)
epochs = 2
for epoch in range(epochs):
totat_loss = 0
for i, data in enumerate(x):
inputs, labels = data, y[i]
optimizer.zero_grad()
preds = test(inputs)
loss = criterion(preds, labels)
loss.backward()
optimizer.step()
for param in test.parameters():
print("+++++++++训练后网络参数的梯度为{}++++++++++".format(param.grad))
返回值如下:
======网络包含的参数有:
[Parameter containing:
tensor([[-0.3202, -0.4112, -0.4270, -0.0370],
[ 0.2662, -0.1365, -0.0656, 0.4369],
[ 0.4909, -0.3320, 0.4375, 0.0742],
[-0.0148, 0.4801, -0.1870, -0.0913]], requires_grad=True),
Parameter containing:
tensor([ 0.2976, 0.4340, -0.3223, -0.3268], requires_grad=True),
Parameter containing:
tensor([[-0.3918, 0.4809, -0.3883, 0.0760]], requires_grad=True),
Parameter containing:
tensor([0.2708], requires_grad=True),
Parameter containing:
tensor([[ 0.4952, 0.4767, 0.1838, -0.2259],
[-0.4951, 0.2937, 0.2473, 0.1304],
[-0.1190, 0.4380, -0.1827, 0.4994],
[ 0.4822, -0.2452, 0.4944, -0.3221]], requires_grad=True),
Parameter containing:
tensor([-0.0752, -0.3379, -0.0910, 0.1005], requires_grad=True)]========
+++++++++训练前网络参数的梯度为None++++++++++
+++++++++训练前网络参数的梯度为None++++++++++
+++++++++训练前网络参数的梯度为None++++++++++
+++++++++训练前网络参数的梯度为None++++++++++
+++++++++训练前网络参数的梯度为None++++++++++
+++++++++训练前网络参数的梯度为None++++++++++
+++++++++训练后网络参数的梯度为tensor([[ 0.0917, 0.0201, 0.0453, 0.0037],
[-0.1126, -0.0246, -0.0556, -0.0045],
[ 0.0909, 0.0199, 0.0449, 0.0037],
[-0.0178, -0.0039, -0.0088, -0.0007]])++++++++++
+++++++++训练后网络参数的梯度为tensor([-0.0726, 0.0891, -0.0720, 0.0141])++++++++++
+++++++++训练后网络参数的梯度为tensor([[ 0.2009, 0.0286, -0.2089, -0.0592]])++++++++++
+++++++++训练后网络参数的梯度为tensor([0.1853])++++++++++
+++++++++训练后网络参数的梯度为None++++++++++
+++++++++训练后网络参数的梯度为None++++++++++
如上述代码所示,fc3并没有参与前向传播,因此求梯度时其梯度为None。