参考链接: backward(gradient=None, retain_graph=None, create_graph=False)
原文及翻译:
backward(gradient=None, retain_graph=None, create_graph=False)
方法: backward(gradient=None, retain_graph=None, create_graph=False)
Computes the gradient of current tensor w.r.t. graph leaves.
计算当前张量相对于计算图中叶子节点的梯度.
The graph is differentiated using the chain rule. If the tensor is non-scalar
(i.e. its data has more than one element) and requires gradient, the function
additionally requires specifying gradient. It should be a tensor of matching type
and location, that contains the gradient of the differentiated function w.r.t. self.
我们使用链式法则来对计算图微分求导计算梯度. 如果张量tensor不是标量(即:它的数据包含多个元素),
并且需要计算梯度,那么需要给这个函数指定一个额外的梯度.这个指定的梯度是一个张量,并且需要满足
一定的条件,它的类型和位置需要匹配,并且它包含了某个可微函数相对于当前张量自身self的梯度.
This function accumulates gradients in the leaves - you might need to zero them
before calling it.
该函数会对叶子节点的梯度进行累加 - 因此你可能需要在调用这个函数之前先将这些叶节点的梯度置零.
Parameters 参数
gradient (Tensor or None) – Gradient w.r.t. the tensor. If it is a tensor,
it will be automatically converted to a Tensor that does not require grad
unless create_graph is True. None values can be specified for scalar Tensors
or ones that don’t require grad. If a None value would be acceptable then
this argument is optional.
gradient (Tensor张量 或者是 None) – 它是相对于tensor张量的梯度. 如果它是一个张量
那么它将会自动被转化为不需要求梯度的张量,除非参数create_graph是True. None值可以指定
给标量类型的Tensor,或者指定给不需要求梯度的张量.如果一个None值将可以被接受,那么这个
参数是可选的.
retain_graph (bool, optional) – If False, the graph used to compute the grads
will be freed. Note that in nearly all cases setting this option to True is
not needed and often can be worked around in a much more efficient way.
Defaults to the value of create_graph.
retain_graph (布尔类型, 可选的) – 如果该参数是False, 用于计算梯度的这个计算图将会在
内存中被释放掉. 值得注意的是,几乎所有将这个选择项设置为True的使用案例都是不需要设置为
True的,并且如果不设置的话通常可以更高效地运行. 该参数的默认值是create_graph的值.
create_graph (bool, optional) – If True, graph of the derivative will be
constructed, allowing to compute higher order derivative products. Defaults
to False.
create_graph (布尔类型, 可选的) – 该参数如果是True,导数的计算图将会被创建,可以用于
计算更高阶数的导数结果.该参数的默认值是False.
Microsoft Windows [版本 10.0.18363.1256]
(c) 2019 Microsoft Corporation。保留所有权利。
C:\Users\chenxuqi>conda activate ssd4pytorch1_2_0
(ssd4pytorch1_2_0) C:\Users\chenxuqi>python
Python 3.7.7 (default, May 6 2020, 11:45:54) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> alpha = torch.tensor([10.0, 100.0, 1000.0])
>>> alpha
tensor([ 10., 100., 1000.])
>>>
>>> X = torch.tensor([4.0, 3.0, 2.0],requires_grad=True)
>>> X
tensor([4., 3., 2.], requires_grad=True)
>>>
>>> Y = torch.zeros(3)
>>> Y
tensor([0., 0., 0.])
>>> x0,x1,x2 = X[0],X[1],X[2]
>>> y0 = 3*x0+7*x1**2+6*x2**3
>>> y1 = 4*x0+8*x1**2+3*x2**3
>>> y2 = 5*x0+9*x1**2+1*x2**3
>>> Y
tensor([0., 0., 0.])
>>> Y[0],Y[1],Y[2] = y0,y1,y2
>>> Y
tensor([123., 112., 109.], grad_fn=<CopySlices>)
>>> X.grad
>>> print(X.grad)
None
>>> print(Y.grad)
None
>>>
>>> Y.backward(gradient=alpha)
>>> print(Y.grad)
None
>>> print(X.grad)
tensor([ 5430., 59220., 16320.])
>>>
>>> # params.grad.zero_()
>>>
>>> Y.backward(gradient=alpha)
Traceback (most recent call last):
File "" , line 1, in <module>
File "D:\Anaconda3\envs\ssd4pytorch1_2_0\lib\site-packages\torch\tensor.py", line 118, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "D:\Anaconda3\envs\ssd4pytorch1_2_0\lib\site-packages\torch\autograd\__init__.py", line 93, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
>>>
>>>
>>>
>>> alpha = torch.tensor([10.0, 100.0, 1000.0])
>>> alpha
tensor([ 10., 100., 1000.])
>>> X = torch.tensor([4.0, 3.0, 2.0],requires_grad=True)
>>> Y = torch.zeros(3)
>>> x0,x1,x2 = X[0],X[1],X[2]
>>> y0 = 3*x0+7*x1**2+6*x2**3
>>> y1 = 4*x0+8*x1**2+3*x2**3
>>> y2 = 5*x0+9*x1**2+1*x2**3
>>> Y
tensor([0., 0., 0.])
>>> Y[0],Y[1],Y[2] = y0,y1,y2
>>> Y
tensor([123., 112., 109.], grad_fn=<CopySlices>)
>>> print(X.grad)
None
>>> print(Y.grad)
None
>>> Y.backward(gradient=alpha,retain_graph=True)
>>> print(X.grad)
tensor([ 5430., 59220., 16320.])
>>> print(Y.grad)
None
>>> Y.backward(gradient=alpha,retain_graph=True)
>>> print(X.grad)
tensor([ 10860., 118440., 32640.])
>>> print(Y.grad)
None
>>> Y.backward(gradient=alpha,retain_graph=True)
>>> print(X.grad)
tensor([ 16290., 177660., 48960.])
>>> print(Y.grad)
None
>>> X.grad.zero_()
tensor([0., 0., 0.])
>>> print(X.grad)
tensor([0., 0., 0.])
>>> print(Y.grad)
None
>>> Y.backward(gradient=alpha,retain_graph=True)
>>> print(X.grad)
tensor([ 5430., 59220., 16320.])
>>> print(Y.grad)
None
>>>
>>>
>>> X.grad.zero_()
tensor([0., 0., 0.])
>>> print(X.grad)
tensor([0., 0., 0.])
>>> print(Y.grad)
None
>>> Y.backward(gradient=alpha)
>>> print(X.grad)
tensor([ 5430., 59220., 16320.])
>>> print(Y.grad)
None
>>>
>>>
>>>