PyTorch笔记:Modules官方文档

来自https://pytorch.org/docs/stable/notes/modules.html

A Simple Custom Module

import torch
from torch import nn

class MyLinear(nn.Module):
  def __init__(self, in_features, out_features):
    super().__init__()
    self.weight = nn.Parameter(torch.randn(in_features, out_features))
    self.bias = nn.Parameter(torch.randn(out_features))

  def forward(self, input):
    return (input @ self.weight) + self.bias
  1. modules特性:
    • 继承自nn.Mudule,可以保证可组装性
    • 定义了一些用于计算的"state"。在上面的例子中,state被定义成一个Parameters,他们在Module中被注册,然后会在Parameters()的调用中被自动追踪和返回。Parameters可以看作module计算的“可学习”层。
    • 定义了forward()函数,它用于计算
  2. pytorch的自动求导机制可以自动维护反向计算,因此不必手动实现backward()方法
  3. 访问被注册的参数集合,可以通过调用parameters()named_parameters()方法实现,后者还包括每个参数的名字
  4. 被module注册的parameters是m那些应该学习的参数

Modules as Building Block

  1. 虽然可以用sequential构建计算过程,但是一般建议定义一个经典的module,因为这样可以有更大灵活性
  2. 下面的例子定义了一个简单的神经网络
import torch.nn.functional as F

class Net(nn.Module):
  def __init__(self):
    super().__init__()
    self.l0 = MyLinear(4, 3)
    self.l1 = MyLinear(3, 1)
  def forward(self, x):
    x = self.l0(x)
    x = F.relu(x)
    x = self.l1(x)
    return x

这个模块由两个"children"或者"submodules"组成。一个模块的中间children可以通过调用children() or named_children()来迭代访问

net = Net()
for child in net.named_children():
	print(child)
: ('l0', MyLinear())
('l1', MyLinear())

而为了获取更加深层次的模块。则可以使用modules() or named_modules()来递归的迭代访问

class BigNet(nn.Module):
  def __init__(self):
    super().__init__()
    self.l1 = MyLinear(5, 4)
    self.net = Net()
  def forward(self, x):
    return self.net(self.l1(x))

big_net = BigNet()
for module in big_net.named_modules():
  print(module)
: ('', BigNet(
  (l1): MyLinear()
  (net): Net(
    (l0): MyLinear()
    (l1): MyLinear()
  )
))
('l1', MyLinear())
('net', Net(
  (l0): MyLinear()
  (l1): MyLinear()
))
('net.l0', MyLinear())
('net.l1', MyLinear())
  1. 有时候,需要动态定义一个模块的子模块,那么可以使用ModuleList or ModuleDict模块:
class DynamicNet(nn.Module):
  def __init__(self, num_layers):
    super().__init__()
    self.linears = nn.ModuleList(
      [MyLinear(4, 4) for _ in range(num_layers)])
    self.activations = nn.ModuleDict({
      'relu': nn.ReLU(),
      'lrelu': nn.LeakyReLU()
    })
    self.final = MyLinear(4, 1)
  def forward(self, x, act):
    for linear in self.linears:
      x = linear(x)
    x = self.activations[act](x)
    x = self.final(x)
    return x

dynamic_net = DynamicNet(3)
sample_input = torch.randn(4)
output = dynamic_net(sample_input, 'relu')
  1. 可以将所有parameters参数迁移到另一个设备或者改变精度,使用to()
  2. 可以使用apply()方法将任意函数应用到一个模块及其子模块中,例如:
# Define a function to initialize Linear weights.
# Note that no_grad() is used here to avoid tracking this computation in the autograd graph.
@torch.no_grad()
def init_weights(m):
  if isinstance(m, nn.Linear):
    nn.init.xavier_normal_(m.weight)
    m.bias.fill_(0.0)

# Apply the function recursively on the module and its submodules.
dynamic_net.apply(init_weights)

Neural Network Training with Modules

这是一个简单的训练过程,它让模型尽可能地输出0

# Create the network (from previous section) and optimizer
net = Net()
optimizer = torch.optim.SGD(net.parameters(), lr=1e-4, weight_decay=1e-2, momentum=0.9)

# Run a sample training loop that "teaches" the network
# to output the constant zero function
for _ in range(10000):
  input = torch.randn(4)
  output = net(input)
  loss = torch.abs(output)
  net.zero_grad()
  loss.backward()
  optimizer.step()

# After training, switch the module to eval mode to do inference, compute performance metrics, etc.
# (see discussion below for a description of training and evaluation modes)
...
net.eval()
...

流程概括:

  • 定义网络结构
  • 创建优化器,将模型的parameters与其关联
  • 训练循环
    • 获取输入
    • 前向计算
    • 计算损失函数
    • 清空网络参数的梯度
    • 调用loss.backward()更新参数梯度
    • 调用optimizer.step()来将梯度更新参数
      上面代码运行完成后,可以通过.weight查看参数

Module State

  1. 保存和加载模型可以通过以下代码实现:
    # Save the module
    torch.save(net.state_dict(), 'net.pt')
    
    ...
    
    # Load the module later on
    new_net = Net()
    new_net.load_state_dict(torch.load('net.pt'))
    : 

一个模块的state_dict包含了影响计算的状态,它包括但是不限于一个模型的参数。对于一些模块,可能会有一些不可学习的但是影响计算的状态。因此,一个模型的state包含以下几种状态:

  • Parameters:可学习的计算方面。包含在state_dict当中
  • Buffers:不可学习的计算方面:
    • Persistent buffers: 被state_dict包含
    • Non-persistent buffers: 不被包含

For more information, check out:

Saving and loading: https://pytorch.org/tutorials/beginner/saving_loading_models.html

Serialization semantics: https://pytorch.org/docs/master/notes/serialization.html

What is a state dict? https://pytorch.org/tutorials/recipes/recipes/what_is_state_dict.html

Module Initialization

By default, parameters and floating-point buffers for modules provided by torch.nn are initialized during module instantiation as 32-bit floating point values on the CPU using an initialization scheme determined to perform well historically for the module type. For certain use cases, it may be desired to initialize with a different dtype, device (e.g. GPU), or initialization technique.

Module Hooks

为了更好的控制前向和反向传播过程,pytorch提供了"hooks",它可以在前向计算或者反向计算中进行任意的计算,即使是修改计算本身。这个应用的例子包括:debug,可视化,检查梯度信息等等。hooks可以被应用到其它人写的模块中,也就是说,这个特性可以在第三方或者pytorch提供的模块里应用。
pytorch提供了两种hooks:

  • Forward hooks: 在前向计算时调用的。它通过register_forward_pre_hook() and register_forward_hook()来安装到一个给定的模块上。这些hooks分别在前向计算之前和之后被调用。这些hook可为所有模块以全局方式被按安装,通过 register_module_forward_pre_hook() and register_module_forward_hook()
  • backward hooks:在反向通路上被调用。使用register_full_backward_hook()安装。这种hook在模块的反向传播过程完成之后被调用,可以用用户访问输入和输出的梯度信息。全局安装使用register_module_full_backward_hook()
    所有的hook允许用户返回一个更新后的,会被剩下计算继续用到的值。因此,这些Hook可以被用来在前向或反向时执行任意的代码而不必修改这些函数
torch.manual_seed(1)

def forward_pre_hook(m, inputs):
  # Allows for examination and modification of the input before the forward pass.
  # Note that inputs are always wrapped in a tuple.
  input = inputs[0]
  return input + 1.

def forward_hook(m, inputs, output):
  # Allows for examination of inputs / outputs and modification of the outputs
  # after the forward pass. Note that inputs are always wrapped in a tuple while outputs
  # are passed as-is.

  # Residual computation a la ResNet.
  return output + inputs[0]

def backward_hook(m, grad_inputs, grad_outputs):
  # Allows for examination of grad_inputs / grad_outputs and modification of
  # grad_inputs used in the rest of the backwards pass. Note that grad_inputs and
  # grad_outputs are always wrapped in tuples.
  new_grad_inputs = [torch.ones_like(gi) * 42. for gi in grad_inputs]
  return new_grad_inputs

# Create sample module & input.
m = nn.Linear(3, 3)
x = torch.randn(2, 3, requires_grad=True)

# ==== Demonstrate forward hooks. ====
# Run input through module before and after adding hooks.
print('output with no forward hooks: {}'.format(m(x)))
: output with no forward hooks: tensor([[-0.5059, -0.8158,  0.2390],
                                        [-0.0043,  0.4724, -0.1714]], grad_fn=)

# Note that the modified input results in a different output.
forward_pre_hook_handle = m.register_forward_pre_hook(forward_pre_hook)
print('output with forward pre hook: {}'.format(m(x)))
: output with forward pre hook: tensor([[-0.5752, -0.7421,  0.4942],
                                        [-0.0736,  0.5461,  0.0838]], grad_fn=)

# Note the modified output.
forward_hook_handle = m.register_forward_hook(forward_hook)
print('output with both forward hooks: {}'.format(m(x)))
: output with both forward hooks: tensor([[-1.0980,  0.6396,  0.4666],
                                          [ 0.3634,  0.6538,  1.0256]], grad_fn=)

# Remove hooks; note that the output here matches the output before adding hooks.
forward_pre_hook_handle.remove()
forward_hook_handle.remove()
print('output after removing forward hooks: {}'.format(m(x)))
: output after removing forward hooks: tensor([[-0.5059, -0.8158,  0.2390],
                                               [-0.0043,  0.4724, -0.1714]], grad_fn=)

# ==== Demonstrate backward hooks. ====
m(x).sum().backward()
print('x.grad with no backwards hook: {}'.format(x.grad))
: x.grad with no backwards hook: tensor([[ 0.4497, -0.5046,  0.3146],
                                         [ 0.4497, -0.5046,  0.3146]])

# Clear gradients before running backward pass again.
m.zero_grad()
x.grad.zero_()

m.register_full_backward_hook(backward_hook)
m(x).sum().backward()
print('x.grad with backwards hook: {}'.format(x.grad))
: x.grad with backwards hook: tensor([[42., 42., 42.],
                                      [42., 42., 42.]])

自己手敲的中文版:

import torch
import torch.nn as nn
torch.manual_seed(42)

def forward_pre_hook(m,inputs):
    # 在前向计算之前,允许进行检查和修改
    # 这里的输出inputs被包装在一个元组内
    input = inputs[0]
    return input + 1

def forward_hook(m,inputs,output):
    # 允许检查inputs/outputs,允许修改output
    # 前向计算之后,inputs总是被包装在一个元组内而output则原样传递

    # 按照ResNet的残差计算
    return output + inputs[0]

def backward_hook(m,grad_inputs,grad_outputs):
    # 允许检查grad_inputs和grad_outputs 修改反向计算剩余的需要用到的grad_inputs
    # grad_inputs,grad_outputs均以元组形式包装
    new_grad_inputs = [torch.ones_like(gi)*42. for gi in grad_inputs]
    return new_grad_inputs

# 创建示例
m = nn.Linear(3,3)
x = torch.randn(2,3,requires_grad = True)

# ==== 展示forward hooks ====
print(F"没有forward hooks的output: {m(x)}")

forward_pre_hook_handle = m.register_forward_pre_hook(forward_pre_hook)
print(F"有forward pre hooks的:{m(x)}")
# 移除
forward_pre_hook_handle.remove()
print(F"移除forward hooks的output: {m(x)}")

# ==== 展示backward hooks ====
m(x).sum().backward()
print(F"x.grad with no backwards hook: {x.grad}")

# 再次计算之前清空梯度
m.zero_grad()
x.grad.zero_()
m.register_full_backward_hook(backward_hook)
m(x).sum().backward()
print(F"x.grad with backwards hook: {x.grad}")

你可能感兴趣的:(PyTorch笔记:Modules官方文档)