最基础的自动求导操作在底层就是作用在两个张量上。前向传播函数是从输入张量到输出张量的计算过程;反向传播是输入输出张量的梯度(一些标量)并输出输入张量的梯度(一些标量)。在pytorch中我们可以很容易地定义自己的自动求导操作,通过继承torch.autograd.Function并定义forward和backward函数。
往autograd中添加操作需要给每一个操作实现一个新的Function子类。Function是autograd用来计算结果和梯度,保存操作记录的。每一个新的function都需要你去实现以下两个方法:
下面是torch.nn里面的一个Linear function的代码:
# Inherit from Function
class LinearFunction(Function):
# Note that both forward and backward are @staticmethods
@staticmethod
# bias is an optional argument
def forward(ctx, input, weight, bias=None):
# ctx在这里类似self, ctx的属性可以在backward中调用
ctx.save_for_backward(input, weight, bias)
output = input.mm(weight.t())
if bias is not None:
output += bias.unsqueeze(0).expand_as(output)
return output
# This function has only a single output, so it gets only one gradient
@staticmethod
def backward(ctx, grad_output):
# This is a pattern that is very convenient - at the top of backward
# unpack saved_tensors and initialize all gradients w.r.t. inputs to
# None. Thanks to the fact that additional trailing Nones are
# ignored, the return statement is simple even when the function has
# optional inputs.
# 在这里,获取ctx中保存的参数
input, weight, bias = ctx.saved_tensors
grad_input = grad_weight = grad_bias = None
# These needs_input_grad checks are optional and there only to
# improve efficiency. If you want to make your code simpler, you can
# skip them. Returning gradients for inputs that don't require it is
# not an error.
if ctx.needs_input_grad[0]:
grad_input = grad_output.mm(weight)
if ctx.needs_input_grad[1]:
grad_weight = grad_output.t().mm(input)
if bias is not None and ctx.needs_input_grad[2]:
grad_bias = grad_output.sum(0).squeeze(0)
return grad_input, grad_weight, grad_bias
为了使用简单些,我们建议给apply 方法换个名字:
linear = LinearFunction.apply
这里,给了另一个例子,参数不是Tensor,而是常量,不需要更新:
class MulConstant(Function):
@staticmethod
def forward(ctx, tensor, constant):
# ctx is a context object that can be used to stash information
# for backward computation
ctx.constant = constant
return tensor * constant
@staticmethod
def backward(ctx, grad_output):
# We return as many input gradients as there were arguments.
# Gradients of non-Tensor arguments to forward must be None.
return grad_output * ctx.constant, None
你可能想检查下backward方法计算function的梯度是否准确,你可以和数值近似方法得到的结果来比较:
from torch.autograd import gradcheck
# gradcheck takes a tuple of tensors as input, check if your gradient
# evaluated with these tensors are close enough to numerical
# approximations and returns True if they all verify this condition.
input = (torch.randn(20,20,dtype=torch.double,requires_grad=True), torch.randn(30,20,dtype=torch.double,requires_grad=True))
test = gradcheck(linear, input, eps=1e-6, atol=1e-4)
print(test)
计算图和自动微分在定义复杂网络和求梯度的时候非常好用,但对于大型网络,这个有点太底层。在我们构建网络的时候,经常希望将计算限制在每个层里面(参数分层更新)。在PyTorch中提供了nn包,定义了一组等价于layer的模块module。一个module接受输入tensor并得到输出tensor,同时也会包含可学习的参数。
有时候我们需要运用一些新的且nn中没有的module,此时就需要自定义自己的module了,自定义的module需要继承nn.Module且自定义forward函数。其中forward函数可以接受输入tensor并利用其他模型或者其他自动给求导操作来产生输出tensor. 但并不需要重写backward函数,因此nn使用了autograd. 这也就意味着,需要自定义module, 都必须要有对应的autograd函数调用其中的backward。
因为nn经常使用autograd,增加一个新的Module需要实现一个计算梯度的function。例如,我们想实现一个Linear module,而且我们已经在上面实现了它的function。我们要实现两个function:
下面就是Linear module如何实现的:
class Linear(nn.Module):
def __init__(self, input_features, output_features, bias=True):
super(Linear, self).__init__()
self.input_features = input_features
self.output_features = output_features
# nn.Parameter is a special kind of Tensor, that will get
# automatically registered as Module's parameter once it's assigned
# as an attribute. Parameters and buffers need to be registered, or
# they won't appear in .parameters() (doesn't apply to buffers), and
# won't be converted when e.g. .cuda() is called. You can use
# .register_buffer() to register buffers.
# nn.Parameters require gradients by default.
self.weight = nn.Parameter(torch.Tensor(output_features, input_features))
if bias:
self.bias = nn.Parameter(torch.Tensor(output_features))
else:
# You should always register all possible parameters, but the
# optional ones can be None if you want.
self.register_parameter('bias', None)
# Not a very smart way to initialize weights
self.weight.data.uniform_(-0.1, 0.1)
if bias is not None:
self.bias.data.uniform_(-0.1, 0.1)
def forward(self, input):
# See the autograd section for explanation of what happens here.
return LinearFunction.apply(input, self.weight, self.bias)
def extra_repr(self):
# (Optional)Set the extra information about this module. You can test
# it by printing an object of this class.
return 'in_features={}, out_features={}, bias={}'.format(
self.in_features, self.out_features, self.bias is not None
)
Function与Module都可以对pytorch进行自定义拓展,使其满足网络的需求,但这两者还是有十分重要的不同: