矩阵求导大致可以分为三大类,标量(分子)求导、向量(分子)求导、矩阵(分子)求导。pytorch中均有所实现。用法:backward+grad
x = torch.randn(10, 5, requires_grad = True)
w = torch.randn(5, 6, requires_grad = True)
y = (x.mm(w)).sum()
y.backward()
x.grad
x = torch.randn(2, 3, requires_grad = True)
print(x)
print(x.data)
print(x.grad_fn)
tensor([[ 0.6669, 0.4549, -0.8281],
[ 1.4578, 0.6001, -1.4019]], requires_grad=True)
tensor([[ 0.6669, 0.4549, -0.8281],
[ 1.4578, 0.6001, -1.4019]])
None
grad_fn是什么呢?意思就是这个tensor是不是由其他tensor构成的函数(由两个以上tensor经过运算得到,但x+1得到的tensor并不是,因为1不是tensor)。从下面的例子可以更清楚的看到:
x = torch.randn(2, 3, requires_grad = True)
y = torch.randn(2, 3, requires_grad = True)
z = x + y
print(z)
print(z.data)
print(z.grad_fn)
x = torch.randn(2, 3, requires_grad = True)
y = torch.randn(2, 3, requires_grad = False)
z = (x + y).sum()
print('z.grad:', z.grad)
z.backward()
print('x.grad:', x.grad)
print('y.grad:', y.grad)
print('z.grad:', z.grad)
w = torch.randn(2, 3, requires_grad = True)
print('w.grad:', w.grad)
z.grad: None
x.grad: tensor([[1., 1., 1.],
[1., 1., 1.]])
y.grad: None
z.grad: None
w.grad: None
此外,参考下例,w不会求导,因为w的grad_fn不为None。
x = torch.randn(2, 3, requires_grad = True)
y = torch.randn(2, 3, requires_grad = True)
w = x + y
print('w.requires_grad:', w.requires_grad)
z = torch.randn(2, 3, requires_grad = True)
out = (w + z).sum()
out.backward()
print('w.grad:', w.grad)
print('x.grad:', x.grad)
print('y.grad:', y.grad)
print('z.grad:', z.grad)
w.requires_grad: True
w.grad: None
x.grad: tensor([[1., 1., 1.],
[1., 1., 1.]])
y.grad: tensor([[1., 1., 1.],
[1., 1., 1.]])
z.grad: tensor([[1., 1., 1.],
[1., 1., 1.]])
综上所述:
标量求导和第一章开头的代码一样,使用backward(),然后求grad
x = torch.randn(10, 5, requires_grad = True)
w = torch.randn(5, 6, requires_grad = True)
y = (x.mm(w)).sum()
y.backward()
x.grad
向量求导和标量求导相似,也是使用backward,只是此时backward()函数中需要设置一个参数gradient,具体为与要backward的tensor shape一样。如下所示:
x = torch.randn(5, 1, requires_grad = True)
w = torch.randn(5, 1, requires_grad = True)
y = x + w
y.backward(gradient = torch.ones_like(y))
x.grad