神经网络参数梯度为None

在修改网络过程中,遇到了网络参数梯度为None的问题,

i.e.,

for param in Net().parameters():
	print(param.grad)
>>>None 

总结一下两种造成梯度为None的原因:

1、求梯度的变量为中间变量:

如下面这段代码:

import torch

print("Trial 1: with python float")
w = torch.randn(3,5,requires_grad = True) * 0.01

x = torch.randn(5,4,requires_grad = True)

y = torch.matmul(w,x).sum(1)
y.backward(torch.ones(3))

print("w.requires_grad:",w.requires_grad)
print("x.requires_grad:",x.requires_grad)

print("w.grad",w.grad)
print("x.grad",x.grad)

print("Trial 2: with on-the-go torch scalar")
w = torch.randn(3,5,requires_grad = True) * torch.tensor(0.01,requires_grad=True)

x = torch.randn(5,4,requires_grad = True)

y = torch.matmul(w,x).sum(1)

y.backward(torch.ones(3))

print("w.requires_grad:",w.requires_grad)
print("x.requires_grad:",x.requires_grad)

print("w.grad",w.grad)
print("x.grad",x.grad)

print("Trial 3: with named torch scalar")
t = torch.tensor(0.01,requires_grad=True)
w = torch.randn(3,5,requires_grad = True) * t

x = torch.randn(5,4,requires_grad = True)

y = torch.matmul(w,x).sum(1)

y.backward(torch.ones(3))

print("w.requires_grad:",w.requires_grad)
print("x.requires_grad:",x.requires_grad)

print("w.grad",w.grad)
print("x.grad",x.grad)

返回值如下:

>>>print("Trial 1: with python float")
>>>w.requires_grad: True
>>>x.requires_grad: True
>>>w.grad None
>>>x.grad tensor([[-0.0238, -0.0238, -0.0238, -0.0238],
        [ 0.0033,  0.0033,  0.0033,  0.0033],
        [ 0.0302,  0.0302,  0.0302,  0.0302],
        [-0.0024, -0.0024, -0.0024, -0.0024],
        [-0.0023, -0.0023, -0.0023, -0.0023]])
>>>Trial 2: with on-the-go torch scalar
>>>w.requires_grad: True
>>>x.requires_grad: True
>>>w.grad None
>>>x.grad tensor([[-0.0171, -0.0171, -0.0171, -0.0171],
        [ 0.0017,  0.0017,  0.0017,  0.0017],
        [-0.0003, -0.0003, -0.0003, -0.0003],
        [-0.0162, -0.0162, -0.0162, -0.0162],
        [ 0.0227,  0.0227,  0.0227,  0.0227]])
>>>Trial 3: with named torch scalar
>>>w.requires_grad: True
>>>x.requires_grad: True
>>>w.grad None
>>>x.grad tensor([[ 0.0154,  0.0154,  0.0154,  0.0154],
        [-0.0095, -0.0095, -0.0095, -0.0095],
        [ 0.0076,  0.0076,  0.0076,  0.0076],
        [ 0.0164,  0.0164,  0.0164,  0.0164],
        [-0.0345, -0.0345, -0.0345, -0.0345]])

变量w为中间变量,所以在求梯度时,w的梯度为None。
解决方式是在.backward之前添加w(中间变量).retain_grad()

...
w.retain_grad()
y.backward()
...

2、神经网络中某层没有参与损失计算

代码如下:

import torch
import torch.nn as nn
import torch.optim as optim

class Test(nn.Module):
    def __init__(self):
        super(Test, self).__init__()
        self.fc1 = nn.Linear(4, 4)
        self.fc2 = nn.Linear(4, 1)
        self.fc3 = nn.Linear(4, 4)  # 多余的FC
    def forward(self, x):
        x = self.fc1(x)
        x = self.fc2(x)
        return x

if __name__ == "__main__":
    test = Test()
    x = torch.randn(2, 1, 4) # data
    y = torch.randn(2, 1, 1) # label
    print("======网络包含的参数有:{}========".format(list(test.parameters())))
    for param in test.parameters():
        print("+++++++++训练前网络参数的梯度为{}++++++++++".format(param.grad))

    criterion = nn.MSELoss()
    optimizer = optim.Adam(Test().parameters(), lr=0.0001, weight_decay=0)
    epochs = 2
    for epoch in range(epochs):
        totat_loss = 0
        for i, data in enumerate(x):
            inputs, labels = data, y[i]
            optimizer.zero_grad()
            preds = test(inputs)
            loss = criterion(preds, labels)
            loss.backward()
            optimizer.step()

            for param in test.parameters():
                print("+++++++++训练后网络参数的梯度为{}++++++++++".format(param.grad))

返回值如下:

======网络包含的参数有:
[Parameter containing:
tensor([[-0.3202, -0.4112, -0.4270, -0.0370],
        [ 0.2662, -0.1365, -0.0656,  0.4369],
        [ 0.4909, -0.3320,  0.4375,  0.0742],
        [-0.0148,  0.4801, -0.1870, -0.0913]], requires_grad=True), 
Parameter containing:
tensor([ 0.2976,  0.4340, -0.3223, -0.3268], requires_grad=True), 
Parameter containing:
tensor([[-0.3918,  0.4809, -0.3883,  0.0760]], requires_grad=True), 
Parameter containing:
tensor([0.2708], requires_grad=True),
Parameter containing:
tensor([[ 0.4952,  0.4767,  0.1838, -0.2259],
        [-0.4951,  0.2937,  0.2473,  0.1304],
        [-0.1190,  0.4380, -0.1827,  0.4994],
        [ 0.4822, -0.2452,  0.4944, -0.3221]], requires_grad=True), 
Parameter containing:
tensor([-0.0752, -0.3379, -0.0910,  0.1005], requires_grad=True)]========

+++++++++训练前网络参数的梯度为None++++++++++
+++++++++训练前网络参数的梯度为None++++++++++
+++++++++训练前网络参数的梯度为None++++++++++
+++++++++训练前网络参数的梯度为None++++++++++
+++++++++训练前网络参数的梯度为None++++++++++
+++++++++训练前网络参数的梯度为None++++++++++

+++++++++训练后网络参数的梯度为tensor([[ 0.0917,  0.0201,  0.0453,  0.0037],
        [-0.1126, -0.0246, -0.0556, -0.0045],
        [ 0.0909,  0.0199,  0.0449,  0.0037],
        [-0.0178, -0.0039, -0.0088, -0.0007]])++++++++++
+++++++++训练后网络参数的梯度为tensor([-0.0726,  0.0891, -0.0720,  0.0141])++++++++++
+++++++++训练后网络参数的梯度为tensor([[ 0.2009,  0.0286, -0.2089, -0.0592]])++++++++++
+++++++++训练后网络参数的梯度为tensor([0.1853])++++++++++
+++++++++训练后网络参数的梯度为None++++++++++
+++++++++训练后网络参数的梯度为None++++++++++

如上述代码所示,fc3并没有参与前向传播,因此求梯度时其梯度为None。

你可能感兴趣的:(个人理解,神经网络,深度学习,python)