pytorch中神经网络主要通过torch.nn来构建。torch.nn依赖于torch.autograd去定义模型并且对它微分。nn.Module包含神经网络的层,并且用forward(input)的方法返回output。
一个神经网络的典型训练流程如下:
(1) Define the neural network that has some learnable parameters (or weights);
(2) Iterate over a dataset of inputs;
(3) Process input through the network;
(4) Compute the loss (how far is the output from being correct);
(5) Propagate gradients back into the network's parameters;
(6) Update the weights of the network, typically using a simple update rule: weight = weight -learning_rate*gradient
import torch
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# 1 input image channel, 6 output channels, 5x5 square convolution
#kernel
self.conv1 = nn.Conv2d(1, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
#an affine operation: y = wx+b
self.fc1 = nn.Linear(16*5*5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
#Max pooling over a (2, 2) window
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
#If the size is a square you can only specially a single number
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
def num_flat_features(self, x):
size = x.size()[1:] #all dimensions except the batch dimension
num_features = 1
for s in size:
num_features *= 5
return num_features
net = Net()
print(net)
Out:
Net(
(conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
(conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
(fc1): Linear(in_features=400, out_features=120, bias=True)
(fc2): Linear(in_features=120, out_features=84, bias=True)
(fc3): Linear(in_features=84, out_features=10, bias=True)
)
你只需要定义forward function,而且backward function(已经计算了梯度)会自动定义,因为你使用了torch.autograd。你可以在forward function 中使用任意的tensor 操作。
模型的学习参数通过net.parameters()返回。
params = list(net.parameters())
print(len(params))
print(params[0].size()) #conv1's .weight
Out
10
torch.Size([6, 1, 5, 5])
置零所有参数的梯度缓冲并且用随机的梯度反向传输:
net.zero_grad()
out.backward(torch.randn(1,10))
注意事项:
torch.nn仅支持mini-batches,不支持a single sample。如果你只有a single example, 可以使用input.unsqueeze(0)增加一个fake batch dimension。
小结一下前面学到的内容:
(1) torch.Tensor - A multi-dimensional array with support for autograd operations like backward().
Also holds the gradient w.r.t. the tensor.
(2) nn.Module - Neural network module. Convenient way of encapsulating parameters,
with helpers for moving them to GPU, exporting, loading, etc.
(3) nn.Parameter - A kind of Tensor, that is automatically registered as a parameter when assigned as an attribute to a Module.
(4) autograd.Function - Implements forward and backward definitions of an autograd operation. Every Tensor operation,
creates at least a single Function node, that connects to functions that created a Tensor and encodes its history.
在nn的包中有很多不同的loss functions, 其中一个简单的损失函数是nn.MSELoss,它是计算输入值与误差值之间的均方差。
例如:
out = net(input)
target = torch.arange(1, 11) #a dummy target, for example
target = target.view(1, -1) #make it the same shape as output
criterion = nn.MSELoss()
loss = criterion(output, target)
print(loss)
Out:
tensor(38.7042)
如果你反向的追踪loss,使用它的.grad_fn属性,你可看到如下的一个计算图:
input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
-> view -> linear -> relu -> linear -> relu -> linear
-> MSELoss
-> loss
误差的反向传输使用loss.backward()。你需要清除已经存在的梯度和会累积到已经存在的梯度上的梯度。
conv1的bias梯度在反向传输之前和之后的变化:
net.zero_grad() # zeroes the gradient buffers of all parameters
print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)
loss.backward()
print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)
Out
conv1.bias.grad before backward
tensor([ 0., 0., 0., 0., 0., 0.])
conv1.bias.grad after backward
tensor(1.00000e-02 *
[-4.6071, -2.4962, 1.5047, -4.6816, -3.6782, -5.3738])
更新weights最简单实用的方法是使用随机梯度下降(SGD, stochastic gradient descent)
weight = weight - learning_rate*gradient
用python语言可以简单写为:
learning_rate = 0.01
for f in net.parameters():
f.data.sub_(f.grad.data * learning_rate)
但是,在使用神经网络的时候,我们可能需要各种不同的更新weights的方法,例如:SGD,Nesterov-SGD,Adam, RMSProp等。为了使用它们,我们建立了有个包:torch.optim,通过调入包就可以使用它们。
import torch.optim as optim
# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)
# in your training loop:
optimizer.zero_grad() # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step() # Does the update