Pytorch入门初体验(三)

神经网络:

pytorch中神经网络主要通过torch.nn来构建。torch.nn依赖于torch.autograd去定义模型并且对它微分。nn.Module包含神经网络的层,并且用forward(input)的方法返回output。

一个神经网络的典型训练流程如下:

(1) Define the neural network that has some learnable parameters (or weights);

(2) Iterate over a dataset of inputs;

(3) Process input through the network;

(4) Compute the loss (how far is the output from being correct);

(5) Propagate gradients back into the network's parameters;

(6) Update the weights of the network, typically using a simple update rule: weight = weight -learning_rate*gradient

定义一个神经网络

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        #kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        #an affine operation: y = wx+b
        self.fc1 = nn.Linear(16*5*5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        #Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        #If the size is a square you can only specially a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]   #all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= 5
            return num_features

net = Net()
print(net)
Out:
Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

你只需要定义forward function,而且backward function(已经计算了梯度)会自动定义,因为你使用了torch.autograd。你可以在forward function 中使用任意的tensor 操作。

模型的学习参数通过net.parameters()返回。

params = list(net.parameters())
print(len(params))
print(params[0].size())  #conv1's .weight
Out
10
torch.Size([6, 1, 5, 5])

置零所有参数的梯度缓冲并且用随机的梯度反向传输:

net.zero_grad()
out.backward(torch.randn(1,10))

注意事项:

torch.nn仅支持mini-batches,不支持a single sample。如果你只有a single example, 可以使用input.unsqueeze(0)增加一个fake batch dimension。

小结一下前面学到的内容:

(1) torch.Tensor - A multi-dimensional array with support for autograd operations like backward().
    Also holds the gradient w.r.t. the tensor.
(2) nn.Module - Neural network module. Convenient way of encapsulating parameters, 
    with helpers for moving them to GPU, exporting, loading, etc.
(3) nn.Parameter - A kind of Tensor, that is automatically registered as a parameter when assigned as an attribute to a Module.
(4) autograd.Function - Implements forward and backward definitions of an autograd operation. Every Tensor operation,
    creates at least a single Function node, that connects to functions that created a Tensor and encodes its history.

损失函数(Loss function)

在nn的包中有很多不同的loss functions, 其中一个简单的损失函数是nn.MSELoss,它是计算输入值与误差值之间的均方差。

例如:

out = net(input)
target = torch.arange(1, 11)  #a dummy target, for example
target = target.view(1, -1)  #make it the same shape as output
criterion = nn.MSELoss()
loss = criterion(output, target)
print(loss)
Out:
tensor(38.7042)

如果你反向的追踪loss,使用它的.grad_fn属性,你可看到如下的一个计算图:

input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
      -> view -> linear -> relu -> linear -> relu -> linear
      -> MSELoss
      -> loss

反向传输(backprop)

误差的反向传输使用loss.backward()。你需要清除已经存在的梯度和会累积到已经存在的梯度上的梯度。

conv1的bias梯度在反向传输之前和之后的变化:

net.zero_grad()     # zeroes the gradient buffers of all parameters

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)
Out
conv1.bias.grad before backward
tensor([ 0.,  0.,  0.,  0.,  0.,  0.])
conv1.bias.grad after backward
tensor(1.00000e-02 *
       [-4.6071, -2.4962,  1.5047, -4.6816, -3.6782, -5.3738])

更新权重值(Update the weights)

更新weights最简单实用的方法是使用随机梯度下降(SGD, stochastic gradient descent)

                       weight = weight - learning_rate*gradient

用python语言可以简单写为:

learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

但是,在使用神经网络的时候,我们可能需要各种不同的更新weights的方法,例如:SGD,Nesterov-SGD,Adam, RMSProp等。为了使用它们,我们建立了有个包:torch.optim,通过调入包就可以使用它们。 

import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update






你可能感兴趣的:(PyTorch学习笔记)