对于浅层的网络,我们可以手动的书写前向传播和反向传播过程。但是当网络变得很大时,特别是在做深度学习时,网络结构变得复杂。前向传播和反向传播也随之变得复杂,手动书写这两个过程就会存在很大的困难。幸运地是在pytorch中存在了自动微分的包,可以用来解决该问题。在使用自动求导的时候,网络的前向传播会定义一个计算图(computational graph),图中的节点是张量(tensor),两个节点之间的边对应了两个张量之间变换关系的函数。有了计算图的存在,张量的梯度计算也变得容易了些。例如, x
是一个张量,其属性 x.requires_grad = True
,那么 x.grad
就是一个保存这个张量x的梯度的一些标量值。
最基础的自动求导操作在底层就是作用在两个张量上。前向传播函数是从输入张量到输出张量的计算过程;反向传播是输入输出张量的梯度(一些标量)并输出输入张量的梯度(一些标量)。在pytorch中我们可以很容易地定义自己的自动求导操作,通过继承torch.autograd.Function
并定义forward
和backward
函数。
forward()
: 前向传播操作。可以输入任意多的参数,任意的python对象都可以。
backward()
:反向传播(梯度公式)。输出的梯度个数需要与所使用的张量个数保持一致,且返回的顺序也要对应起来。
# Inherit from Function
class LinearFunction(Function):
# Note that both forward and backward are @staticmethods
@staticmethod
# bias is an optional argument
def forward(ctx, input, weight, bias=None):
# ctx在这里类似self,ctx的属性可以在backward中调用
ctx.save_for_backward(input, weight, bias)
output = input.mm(weight.t())
if bias is not None:
output += bias.unsqueeze(0).expand_as(output)
return output
# This function has only a single output, so it gets only one gradient
@staticmethod
def backward(ctx, grad_output):
# This is a pattern that is very convenient - at the top of backward
# unpack saved_tensors and initialize all gradients w.r.t. inputs to
# None. Thanks to the fact that additional trailing Nones are
# ignored, the return statement is simple even when the function has
# optional inputs.
input, weight, bias = ctx.saved_tensors
grad_input = grad_weight = grad_bias = None
# These needs_input_grad checks are optional and there only to
# improve efficiency. If you want to make your code simpler, you can
# skip them. Returning gradients for inputs that don't require it is
# not an error.
if ctx.needs_input_grad[0]:
grad_input = grad_output.mm(weight)
if ctx.needs_input_grad[1]:
grad_weight = grad_output.t().mm(input)
if bias is not None and ctx.needs_input_grad[2]:
grad_bias = grad_output.sum(0).squeeze(0)
return grad_input, grad_weight, grad_bias
#调用自定义的自动求导函数
linear = LinearFunction.apply(*args) #前向传播
linear.backward()#反向传播
linear.grad_fn.apply(*args)#反向传播
对于非参数化的张量(权重是常量,不需要更新),此时可以定义为:
class MulConstant(Function):
@staticmethod
def forward(ctx, tensor, constant):
# ctx is a context object that can be used to stash information
# for backward computation
ctx.constant = constant
return tensor * constant
@staticmethod
def backward(ctx, grad_output):
# We return as many input gradients as there were arguments.
# Gradients of non-Tensor arguments to forward must be None.
return grad_output * ctx.constant, None
grad_x =t.autograd.grad(y, x, create_graph=True)
grad_grad_x = t.autograd.grad(grad_x[0],x)
计算图和自动求导在定义复杂网络和求梯度的时候非常好用,但对于大型的网络,这个还是有点偏底层。在我们构建网络的时候,经常希望将计算限制在每个层之内(参数更新分层更新)。而且在TensorFlow等其他深度学习框架中都提供了高级抽象结构。因此,在pytorch中也提供了类似的包nn
,它定义了一组等价于层(layer)的模块(Modules)。一个Module接受输入张量并得到输出张量,同时也会包含可学习的参数。
有时候,我们希望运用一些新的且nn
包中不存在的Module。此时就需要定义自己的Module了。自定义的Module需要继承nn.Module
且自定义forward
函数。其中forward
函数可以接受输入张量并利用其它模型或者其他自动求导操作来产生输出张量。但并不需要重写backward
函数,因此nn
使用了autograd
。这也就意味着,需要自定义Module, 都必须有对应的autograd
函数以调用其中的backward
。
class Linear(nn.Module):
def __init__(self, input_features, output_features, bias=True):
super(Linear, self).__init__()
self.input_features = input_features
self.output_features = output_features
# nn.Parameter is a special kind of Tensor, that will get
# automatically registered as Module's parameter once it's assigned
# as an attribute. Parameters and buffers need to be registered, or
# they won't appear in .parameters() (doesn't apply to buffers), and
# won't be converted when e.g. .cuda() is called. You can use
# .register_buffer() to register buffers.
# (很重要!!!参数一定需要梯度!)nn.Parameters require gradients by default.
self.weight = nn.Parameter(torch.Tensor(output_features, input_features))
if bias:
self.bias = nn.Parameter(torch.Tensor(output_features))
else:
# You should always register all possible parameters, but the
# optional ones can be None if you want.
self.register_parameter('bias', None)
# Not a very smart way to initialize weights
self.weight.data.uniform_(-0.1, 0.1)
if bias is not None:
self.bias.data.uniform_(-0.1, 0.1)
def forward(self, input):
# See the autograd section for explanation of what happens here.
return LinearFunction.apply(input, self.weight, self.bias)
def extra_repr(self):
# (Optional)Set the extra information about this module. You can test
# it by printing an object of this class.
return 'in_features={}, out_features={}, bias={}'.format(
self.in_features, self.out_features, self.bias is not None
Function与Module都可以对pytorch进行自定义拓展,使其满足网络的需求,但这两者还是有十分重要的不同:
[1] pytorch documents
[2] pytorch tutorials
[3] Pytorch 中 Function与Module的差异与应用场景
[4] pytorch 学习笔记之自定义 Module
[5] 探讨Pytorch中nn.Module与nn.autograd.Function的backward()函数
[6] [Pytorch]:自定义网络层
[7] Pytorch入门学习(八)—–自定义层的实现(甚至不可导operation的backward写法)
[8] 『PyTorch』第五弹 深入理解autograd_下:函数扩展&高阶导数
[9] 探讨pytorch中nn.Module与nn.autograd.Function的backward()函数