PyTorch.ipynb
API | 灵活性 | 便捷性 |
---|---|---|
Barebone | 高 | 低 |
nn.Module | 高 | 中 |
nn.Sequential | 低 | 高 |
PyTorch是一个在**张量对象(tensor)**上执行动态计算图形的系统,这些张量对象的行为类似于numpy ndarray。它提供了一个强大的自动微分引擎,消除了手动反向传播(back-propagation)的需要。
Justin Johnson 针对 Pytorch 分享了一个 tutorial ;
也可以在官方文档找到更详尽的内容 API doc;
遇到一些解决不了的问题,你可以去官方社区寻求帮助 PyTorch forum。
首先,我们下载 CIFAR-10 数据集,并利用 Pytorch 的模块来声明数据集,预处理数据以及生成 mini-batch。
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.utils.data import sampler
import torchvision.datasets as dset
import torchvision.transforms as T
import numpy as np
NUM_TRAIN = 49000
# The torchvision.transforms package provides tools for preprocessing data
# and for performing data augmentation; here we set up a transform to
# preprocess the data by subtracting the mean RGB value and dividing by the
# standard deviation of each RGB value; we've hardcoded the mean and std.
transform = T.Compose([
T.ToTensor(),
T.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])
# We set up a Dataset object for each split (train / val / test); Datasets load
# training examples one at a time, so we wrap each Dataset in a DataLoader which
# iterates through the Dataset and forms minibatches. We divide the CIFAR-10
# training set into train and val sets by passing a Sampler object to the
# DataLoader telling how it should sample from the underlying Dataset.
cifar10_train = dset.CIFAR10('./cs231n/datasets', train=True, download=True,
transform=transform)
loader_train = DataLoader(cifar10_train, batch_size=64,
sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN)))
cifar10_val = dset.CIFAR10('./cs231n/datasets', train=True, download=True,
transform=transform)
loader_val = DataLoader(cifar10_val, batch_size=64,
sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN, 50000)))
cifar10_test = dset.CIFAR10('./cs231n/datasets', train=False,
download=True, transform=transform)
loader_test = DataLoader(cifar10_test, batch_size=64)
然后,设置我们项目的全局 数据类型 以及 数据存储的设备, torch.cuda.is_available()
返回我们的 pytorch 是否支持 GPU,然后 dtype
和device
设置类型和设备。
USE_GPU = True
dtype = torch.float32 # we will be using float throughout this tutorial
if USE_GPU and torch.cuda.is_available():
device = torch.device('cuda')
else:
device = torch.device('cpu')
# Constant to control how frequently we print train loss
print_every = 100
print('using device:', device)
这节我们会先写一个简单的网络用于 CIFAR 数据集的分类,其只包含带ReLu激活的全连接层,而且只有两个 隐藏层。(我们会基于 Pytorch Tensor 完成前向传播,并借助 Pytorch 的自动求导机制 autograd 完成反向传播)。
requires_grad=True
时,表明张量不只是计算值,也将在后台建立一个计算图,使我们可以很容易地通过图形反向传播来计算一些张量相对于下游损失的梯度。x.requires_grad == True
的张量,然后反向传播后x.grad
是另一个张量,它包含了x关于最后的标量损耗loss的梯度。PyTorch张量在概念上类似于numpy数组:它是一个n维数字网格.。与numpy一样,它提供了许多函数来有效地操作张量 Tensor。
作为一个简单的例子,我们提供了一个flatten
函数,用于在一个全连接的神经网络中对图像数据进行拉伸变型。
回想一下,图像数据通常存储在一个形状为 N x C x H x W N x C x H x W NxCxHxW 的张量中,因此,我们使用**“flatten”操作将每个表示为 C x H x W C x H x W CxHxW 的张量 折叠成一个长向量**。
def flatten(x):
N = x.shape[0] # read in N, C, H, W
return x.view(N, -1) # "flatten" the C * H * W values into a single vector per image
def test_flatten():
x = torch.arange(12).view(2, 1, 3, 2)
print('Before flattening: ', x)
print('After flattening: ', flatten(x))
test_flatten()
输出:
Before flattening: tensor([[[[ 0, 1],
[ 2, 3],
[ 4, 5]]],
[[[ 6, 7],
[ 8, 9],
[10, 11]]]])
After flattening: tensor([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11]])
可以看到x的维度从 2 ∗ 1 ∗ 3 ∗ 2 2 * 1 * 3 * 2 2∗1∗3∗2 变成了 2 ∗ 6 2*6 2∗6。
在这里,我们定义了一个函数two_layer_fc
,它对一批图像数据执行两层全连接ReLU网络的前向传递。在定义了向前传递之后,我们检查它是否崩溃,并通过在网络中输入0来检验输出的形状。
您不必在这里编写任何代码,但是阅读和理解实现是非常重要的
import torch.nn.functional as F # useful stateless functions
def two_layer_fc(x, params):
"""
A fully-connected neural networks; the architecture is:
NN is fully connected -> ReLU -> fully connected layer.
Note that this function only defines the forward pass;
PyTorch will take care of the backward pass for us.
The input to the network will be a minibatch of data, of shape
(N, d1, ..., dM) where d1 * ... * dM = D. The hidden layer will have H units,
and the output layer will produce scores for C classes.
Inputs:
- x: A PyTorch Tensor of shape (N, d1, ..., dM) giving a minibatch of
input data.
- params: A list [w1, w2] of PyTorch Tensors giving weights for the network;
w1 has shape (D, H) and w2 has shape (H, C).
Returns:
- scores: A PyTorch Tensor of shape (N, C) giving classification scores for
the input data x.
"""
# first we flatten the image
x = flatten(x) # shape: [batch_size, C x H x W]
w1, w2 = params
# Forward pass: compute predicted y using operations on Tensors. Since w1 and
# w2 have requires_grad=True, operations involving these Tensors will cause
# PyTorch to build a computational graph, allowing automatic computation of
# gradients. Since we are no longer implementing the backward pass by hand we
# don't need to keep references to intermediate values.
# you can also use `.clamp(min=0)`, equivalent to F.relu()
x = F.relu(x.mm(w1))
x = x.mm(w2)
return x
def two_layer_fc_test():
hidden_layer_size = 42
x = torch.zeros((64, 50), dtype=dtype) # minibatch size 64, feature dimension 50
w1 = torch.zeros((50, hidden_layer_size), dtype=dtype)
w2 = torch.zeros((hidden_layer_size, 10), dtype=dtype)
scores = two_layer_fc(x, [w1, w2])
print(scores.size()) # you should see [64, 10]
two_layer_fc_test()
torch.Size([64, 10])
在这里,您将完成函数 three_layer_convnet
的实现,该函数将执行一个三层卷积网络的前向传递。像上面一样,我们可以通过在网络中传递0来立即测试我们的实现。该网络应具有以下架构:
channel_1
个卷积核,大小为 K W 1 × K H 1 KW_1 \times KH_1 KW1×KH1,且带有2个0填充;channel_2
个卷积核,大小为 K W 2 × K H 2 KW_2 \times KH_2 KW2×KH2,且带有1个0填充;torch.nn.functional.conv2d文档
torch.nn.Conv2d( in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')
torch.nn.functional.conv2d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1) → Tensor
def three_layer_convnet(x, params):
"""
Performs the forward pass of a three-layer convolutional network with the
architecture defined above.
Inputs:
- x: A PyTorch Tensor of shape (N, 3, H, W) giving a minibatch of images
- params: A list of PyTorch Tensors giving the weights and biases for the
network; should contain the following:
- conv_w1: PyTorch Tensor of shape (channel_1, 3, KH1, KW1) giving weights
for the first convolutional layer
- conv_b1: PyTorch Tensor of shape (channel_1,) giving biases for the first
convolutional layer
- conv_w2: PyTorch Tensor of shape (channel_2, channel_1, KH2, KW2) giving
weights for the second convolutional layer
- conv_b2: PyTorch Tensor of shape (channel_2,) giving biases for the second
convolutional layer
- fc_w: PyTorch Tensor giving weights for the fully-connected layer. Can you
figure out what the shape should be?
- fc_b: PyTorch Tensor giving biases for the fully-connected layer. Can you
figure out what the shape should be?
Returns:
- scores: PyTorch Tensor of shape (N, C) giving classification scores for x
"""
conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b = params
scores = None
# TODO: Implement the forward pass for the three-layer ConvNet.
#torch.nn.functional.conv2d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1)#
x = F.conv2d(x,conv_w1,bias=conv_b1,padding=2)
x = F.relu(x)
x = F.conv2d(x,conv_w2,bias=conv_b2,padding=1)
x = flatten(x)
# 添加映射层
scores = x.mm(fc_w) + fc_b
return scores
def three_layer_convnet_test():
x = torch.zeros((64, 3, 32, 32), dtype=dtype) # minibatch size 64, image size [3, 32, 32]
conv_w1 = torch.zeros((6, 3, 5, 5), dtype=dtype) # [out_channel, in_channel, kernel_H, kernel_W]
conv_b1 = torch.zeros((6,)) # out_channel
conv_w2 = torch.zeros((9, 6, 3, 3), dtype=dtype) # [out_channel, in_channel, kernel_H, kernel_W]
conv_b2 = torch.zeros((9,)) # out_channel
# you must calculate the shape of the tensor after two conv layers, before the fully-connected layer
fc_w = torch.zeros((9 * 32 * 32, 10))
fc_b = torch.zeros(10)
scores = three_layer_convnet(x, [conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b])
print(scores.size()) # you should see [64, 10]
three_layer_convnet_test()
输出:torch.Size([64, 10])
让我们写几个实用程序方法来初始化我们的模型的权重矩阵
random_weight(shape)
initializes a weight tensor with the Kaiming normalization method.zero_weight(shape)
initializes a weight tensor with all zeros. Useful for instantiating bias parameters.def random_weight(shape):
"""
Create random Tensors for weights; setting requires_grad=True means that we
want to compute gradients for these Tensors during the backward pass.
We use Kaiming normalization: sqrt(2 / fan_in)
"""
if len(shape) == 2: # FC weight
fan_in = shape[0]
else:
fan_in = np.prod(shape[1:]) # conv weight [out_channel, in_channel, kH, kW]
# randn is standard normal distribution generator.
w = torch.randn(shape, device=device, dtype=dtype) * np.sqrt(2. / fan_in)
w.requires_grad = True
return w
def zero_weight(shape):
return torch.zeros(shape, device=device, dtype=dtype, requires_grad=True)
# create a weight of shape [3 x 5]
# you should see the type `torch.cuda.FloatTensor` if you use GPU.
# Otherwise it should be `torch.FloatTensor`
random_weight((3, 5))
输出:
tensor([[-0.3170, 1.1586, 0.2524, -0.0345, 0.0226],
[ 0.3086, 1.2709, 0.4495, -1.0421, -0.3212],
[ 0.8470, 1.1458, 0.4931, -0.3018, -0.4302]], device='cuda:0',
requires_grad=True)
在对模型进行训练时,我们将使用以下函数来检查我们的模型在训练集或验证集上的准确性。
在检查精度时,我们不需要计算任何梯度;因此,在计算分数时,我们不需要PyTorch为我们构建计算图。为了防止构建图形,我们在torch.no_grad()
上下文管理器下确定计算范围。
def check_accuracy_part2(loader, model_fn, params):
"""
Check the accuracy of a classification model.
Inputs:
- loader: A DataLoader for the data split we want to check
- model_fn: A function that performs the forward pass of the model,
with the signature scores = model_fn(x, params)
- params: List of PyTorch Tensors giving parameters of the model
Returns: Nothing, but prints the accuracy of the model
"""
split = 'val' if loader.dataset.train else 'test'
print('Checking accuracy on the %s set' % split)
num_correct, num_samples = 0, 0
with torch.no_grad():
for x, y in loader:
x = x.to(device=device, dtype=dtype) # move to device, e.g. GPU
y = y.to(device=device, dtype=torch.int64)
scores = model_fn(x, params)
_, preds = scores.max(1) # tensor.max()返回一个元组(最大值,最大值对应的索引)
num_correct += (preds == y).sum()
num_samples += preds.size(0)
acc = float(num_correct) / num_samples
print('Got %d / %d correct (%.2f%%)' % (num_correct, num_samples, 100 * acc))
def train_part2(model_fn, params, learning_rate):
"""
Train a model on CIFAR-10.
Inputs:
- model_fn: A Python function that performs the forward pass of the model.
It should have the signature scores = model_fn(x, params) where x is a
PyTorch Tensor of image data, params is a list of PyTorch Tensors giving
model weights, and scores is a PyTorch Tensor of shape (N, C) giving
scores for the elements in x.
- params: List of PyTorch Tensors giving weights for the model
- learning_rate: Python scalar giving the learning rate to use for SGD
Returns: Nothing
"""
for t, (x, y) in enumerate(loader_train):
# Move the data to the proper device (GPU or CPU)
x = x.to(device=device, dtype=dtype)
y = y.to(device=device, dtype=torch.long)
# Forward pass: compute scores and loss
scores = model_fn(x, params)
loss = F.cross_entropy(scores, y)
# Backward pass: PyTorch figures out which Tensors in the computational
# graph has requires_grad=True and uses backpropagation to compute the
# gradient of the loss with respect to these Tensors, and stores the
# gradients in the .grad attribute of each Tensor.
loss.backward()
# Update parameters. We don't want to backpropagate through the
# parameter updates, so we scope the updates under a torch.no_grad()
# context manager to prevent a computational graph from being built.
with torch.no_grad():
for w in params:
w -= learning_rate * w.grad
# Manually zero the gradients after running the backward pass
w.grad.zero_()
if t % print_every == 0:
print('Iteration %d, loss = %.4f' % (t, loss.item()))
check_accuracy_part2(loader_val, model_fn, params)
hidden_layer_size = 4000
learning_rate = 1e-2
w1 = random_weight((3 * 32 * 32, hidden_layer_size))
w2 = random_weight((hidden_layer_size, 10))
train_part2(two_layer_fc, [w1, w2], learning_rate)
# val set 上的准确度大概在 40%
Iteration 0, loss = 3.2187
Checking accuracy on the val set
Got 129 / 1000 correct (12.90%)
Iteration 100, loss = 2.0614
Checking accuracy on the val set
Got 325 / 1000 correct (32.50%)
Iteration 200, loss = 1.6769
Checking accuracy on the val set
Got 371 / 1000 correct (37.10%)
Iteration 300, loss = 1.9845
Checking accuracy on the val set
Got 394 / 1000 correct (39.40%)
Iteration 400, loss = 2.1362
Checking accuracy on the val set
Got 354 / 1000 correct (35.40%)
Iteration 500, loss = 1.7240
Checking accuracy on the val set
Got 442 / 1000 correct (44.20%)
Iteration 600, loss = 2.3258
Checking accuracy on the val set
Got 438 / 1000 correct (43.80%)
Iteration 700, loss = 1.9758
Checking accuracy on the val set
Got 430 / 1000 correct (43.00%)
learning_rate = 3e-3
channel_1 = 32
channel_2 = 16
conv_w1 = None
conv_b1 = None
conv_w2 = None
conv_b2 = None
fc_w = None
fc_b = None
# TODO: Initialize the parameters of a three-layer ConvNet. #
conv_w1 = random_weight((32,3,5,5))
conv_b1 = zero_weight((32,))
conv_w2 = random_weight((16,32,3,3))
conv_b2 = zero_weight((16,))
fc_w = random_weight((16 * 32 * 32,10))
fc_b = zero_weight(10)
params = [conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b]
train_part2(three_layer_convnet, params, learning_rate)
Iteration 0, loss = 4.2925
Checking accuracy on the val set
Got 141 / 1000 correct (14.10%)
Iteration 100, loss = 1.6818
Checking accuracy on the val set
Got 350 / 1000 correct (35.00%)
Iteration 200, loss = 1.9404
Checking accuracy on the val set
Got 450 / 1000 correct (45.00%)
Iteration 300, loss = 1.7681
Checking accuracy on the val set
Got 474 / 1000 correct (47.40%)
Iteration 400, loss = 1.8228
Checking accuracy on the val set
Got 460 / 1000 correct (46.00%)
Iteration 500, loss = 1.4132
Checking accuracy on the val set
Got 483 / 1000 correct (48.30%)
Iteration 600, loss = 1.5273
Checking accuracy on the val set
Got 496 / 1000 correct (49.60%)
Iteration 700, loss = 1.5708
Checking accuracy on the val set
Got 509 / 1000 correct (50.90%)
Barebone PyTorch要求我们手动跟踪所有的参数张量。这对于只有几个张量的小型网络来说很好,但是在大型网络中跟踪几十个或几百个张量会非常不方便而且容易出错。
PyTorch提供了nn.Module
帮助您定义任意的网络架构,同时为您跟踪每个可学习的参数。在第二部分中,我们自己实现了SGD。PyTorch还提供了torch.optim
包,它实现了所有常见的优化器,比如RMSProp、Adagrad和Adam。它甚至支持近似的二阶方法,如L-BFGS。您可以参考文档doc以获得每个优化器的准确规范。
为了使用Module API,我们按照下列步骤:
nn.Module
。给你的网络类一个直观的名字,比如TwoLayerFC。_init__()
中,将需要的所有层定义为类属性。nn.Linear
和 nn.Conv2d
本身就是nn.Module
子类,并包含可学习的参数,因此您不必自己实例化原始张量。nn.Module
将为您跟踪这些内部参数。参考文档doc以了解更多关于构建层的信息。forward()
方法中,定义网络的连接性。你应该使用在_init__
中定义的属性作为函数调用,以张量作为输入,输出“变换后的”张量。不要在forward()中创建任何带有可学习参数的新层!,所有这些层都必须在_init_
里面声明。定义了模块子类之后,可以将其实例化为对象并像第2部分中的NN前向函数那样调用它。
class TwoLayerFC(nn.Module):
def __init__(self, input_size, hidden_size, num_classes):
super().__init__()
# assign layer objects to class attributes
self.fc1 = nn.Linear(input_size, hidden_size)
# nn.init package contains convenient initialization methods
# http://pytorch.org/docs/master/nn.html#torch-nn-init
nn.init.kaiming_normal_(self.fc1.weight)
self.fc2 = nn.Linear(hidden_size, num_classes)
nn.init.kaiming_normal_(self.fc2.weight)
def forward(self, x):
# forward always defines connectivity
x = flatten(x)
scores = self.fc2(F.relu(self.fc1(x)))
return scores
def test_TwoLayerFC():
input_size = 50
x = torch.zeros((64, input_size), dtype=dtype) # minibatch size 64, feature dimension 50
model = TwoLayerFC(input_size, 42, 10)
scores = model(x)
print(scores.size()) # you should see [64, 10]
test_TwoLayerFC()
It’s your turn to implement a 3-layer ConvNet followed by a fully connected layer. The network architecture should be the same as in Part II:
You should initialize the weight matrices of the model using the Kaiming normal initialization method.
torch.nn.functional.conv2d文档
torch.nn.Conv2d( in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')
class ThreeLayerConvNet(nn.Module):
def __init__(self, in_channel, channel_1, channel_2, num_classes):
super().__init__()
# TODO: Set up the layers you need for a three-layer ConvNet with the #
# architecture defined above. #
self.conv1 = nn.Conv2d(in_channels=in_channel,out_channels=channel_1,kernel_size=5,padding=2, bias=True)
nn.init.kaiming_normal_(self.conv1.weight)
nn.init.constant_(self.conv1.bias,0)
self.relu = F.relu
self.conv2 = nn.Conv2d(in_channels=channel_1,out_channels=channel_2,kernel_size=3,padding=1,bias=True)
nn.init.kaiming_normal_(self.conv2.weight)
nn.init.constant_(self.conv2.bias,0)
self.fc = nn.Linear(channel_2 * 32 * 32, num_classes)
def forward(self, x):
scores = None
########################################################################
# TODO: Implement the forward function for a 3-layer ConvNet. you #
# should use the layers you defined in __init__ and specify the #
# connectivity of those layers in forward() #
########################################################################
x = self.relu(self.conv1(x))
x = self.relu(self.conv2(x))
scores = self.fc(flatten(x))
return scores
def test_ThreeLayerConvNet():
x = torch.zeros((64, 3, 32, 32), dtype=dtype) # minibatch size 64, image size [3, 32, 32]
model = ThreeLayerConvNet(in_channel=3, channel_1=12, channel_2=8, num_classes=10)
scores = model(x)
print(scores.size()) # you should see [64, 10]
test_ThreeLayerConvNet()
给定验证或测试集,我们可以检查神经网络的分类精度。
这个版本与第二部分略有不同。您不再需要手动传递参数。
def check_accuracy_part34(loader, model):
if loader.dataset.train:
print('Checking accuracy on validation set')
else:
print('Checking accuracy on test set')
num_correct = 0
num_samples = 0
model.eval() # set model to evaluation mode
with torch.no_grad():
for x, y in loader:
x = x.to(device=device, dtype=dtype) # move to device, e.g. GPU
y = y.to(device=device, dtype=torch.long)
scores = model(x)
_, preds = scores.max(1)
num_correct += (preds == y).sum()
num_samples += preds.size(0)
acc = float(num_correct) / num_samples
print('Got %d / %d correct (%.2f)' % (num_correct, num_samples, 100 * acc))
注意:
model.eval()
,此时不会记录batchnorm
的训练均值等;with torch.no_grad()
不会记录训练梯度;我们还使用了一个稍微不同的训练循环。我们使用来自torch.optim
的优化器对象,而不是自己更新权重的值。它抽象了优化算法的概念,并提供了通常用于优化神经网络的大多数算法的实现。
def train_part34(model, optimizer, epochs=1):
"""
Train a model on CIFAR-10 using the PyTorch Module API.
Inputs:
- model: A PyTorch Module giving the model to train.
- optimizer: An Optimizer object we will use to train the model
- epochs: (Optional) A Python integer giving the number of epochs to train for
Returns: Nothing, but prints model accuracies during training.
"""
model = model.to(device=device) # move the model parameters to CPU/GPU
for e in range(epochs):
for t, (x, y) in enumerate(loader_train):
model.train() # put model to training mode
x = x.to(device=device, dtype=dtype) # move to device, e.g. GPU
y = y.to(device=device, dtype=torch.long)
scores = model(x)
loss = F.cross_entropy(scores, y)
# Zero out all of the gradients for the variables which the optimizer
# will update.
optimizer.zero_grad()
# This is the backwards pass: compute the gradient of the loss with
# respect to each parameter of the model.
loss.backward()
# Actually update the parameters of the model using the gradients
# computed by the backwards pass.
optimizer.step()
if t % print_every == 0:
print('Iteration %d, loss = %.4f' % (t, loss.item()))
check_accuracy_part34(loader_val, model)
print()
现在我们准备运行训练循环。与第二部分不同,我们不再显式地分配参数张量。
只需将输入大小、隐藏层大小和类的数量(即输出大小)传递给TwoLayerFC
的构造函数。
您还需要定义一个优化器来跟踪TwoLayerFC中的所有可学习参数。
您不需要调整任何超参数,但是经过一个epoch的训练后,您应该可以看到模型精度超过40%。
hidden_layer_size = 4000
learning_rate = 1e-2
model = TwoLayerFC(3 * 32 * 32, hidden_layer_size, 10)
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
train_part34(model, optimizer)
最终精度在43%~45%左右。
现在您应该使用Module API
在CIFAR上训练一个三层的ConvNet。这应该看起来很像训练两层网络!您不需要调整任何超参数,但是经过一段时间的训练后,您应该可以达到45%以上。你应该使用没有动量的随机梯度下降来训练模型。
learning_rate = 3e-3
channel_1 = 32
channel_2 = 16
model = None
optimizer = None
# TODO: Instantiate your ThreeLayerConvNet model and a corresponding optimizer #
model = ThreeLayerConvNet(3, channel_1, channel_2, 10)
optimizer = torch.optim.SGD(model.parameters(),lr=learning_rate)
train_part34(model, optimizer)
最终精度在48%左右。
第三部分介绍了PyTorch Module API,它允许您定义任意可学习的层及其连接性。
对于像前馈层堆叠这样的简单模型,仍然需要经过3步:继承nn.Module
类000,在_init__
中将网络层声明成类属性,并在forward()
方法中逐个调用每一层。
有没有更方便的方法?
幸运的是,PyTorch提供了一个名为nn.Sequentiao
的容器模块,它将上述步骤合并为一个。它不像nn.Module`那么灵活。模块,因为您不能指定比前馈堆栈更复杂的拓扑结构,但是它对于许多用例来说已经足够了。
# We need to wrap `flatten` function in a module in order to stack it
# in nn.Sequential
class Flatten(nn.Module):
def forward(self, x):
return flatten(x)
hidden_layer_size = 4000
learning_rate = 1e-2
model = nn.Sequential(
Flatten(),
nn.Linear(3 * 32 * 32, hidden_layer_size),
nn.ReLU(),
nn.Linear(hidden_layer_size, 10),
)
# you can use Nesterov momentum in optim.SGD
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate,
momentum=0.9, nesterov=True)
train_part34(model, optimizer)
channel_1 = 32
channel_2 = 16
learning_rate = 1e-2
model = None
optimizer = None
# TODO: Rewrite the 2-layer ConvNet with bias from Part III with the #
# Sequential API. #
model = nn.Sequential(
nn.Conv2d(3,channel_1,kernel_size=5,padding=2),
nn.ReLU(),
nn.Conv2d(channel_1,channel_2,kernel_size=3,padding=1),
nn.ReLU(),
Flatten(),
nn.Linear(channel_2*32*32,10)
)
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate,
momentum=0.9, nesterov=True)
train_part34(model, optimizer)
在本节中,您可以在CIFAR-10上试验任何您想要的ConvNet架构。
现在,您的工作是对 体系结构、超参数、损失函数和优化器 进行试验,以训练在10个epoch内在CIFAR-10验证集上获得至少70%的准确率的模型。您可以使用上面的check_accuracy
和train
函数。你可以使用任何一个nn.Module
或nn.Sequential
的API。
以下是每个组件的官方API文档。注意:我们在Pytorch中将“spatial batch norm”称为“BatchNorm2D”。
对于您尝试的每个网络体系结构,您都应该调整学习率和其他超参数。当你这样做的时候,有几件重要的事情要记住:
如果您喜欢冒险,您可以实现许多其他特性来尝试改进性能。您不需要实现其中的任何一个,但是如果您有时间,请不要错过其中的乐趣!
我试了一下修改的ResNet18网络,在第3个epoch开始能得到75.4%的准确度,最高78%的准确度(验证集)。
################################################################################
# TODO: #
# Experiment with any architectures, optimizers, and hyperparameters. #
# Achieve AT LEAST 70% accuracy on the *validation set* within 10 epochs. #
# #
# Note that you can use the check_accuracy function to evaluate on either #
# the test set or the validation set, by passing either loader_test or #
# loader_val as the second argument to check_accuracy. You should not touch #
# the test set until you have finished your architecture and hyperparameter #
# tuning, and only run the test set once at the end to report a final value. #
################################################################################
from collections import OrderedDict
model = None
optimizer = None
def conv3x3(in_planes,out_planes,stride = 1):
# "3x3 convolution with padding"
return nn.Conv2d(
in_planes, out_planes,
kernel_size=3, stride=stride, padding=1, bias=False)
class BasicBlock(nn.Module):
expansion = 1
def __init__(self, inplanes, planes, stride=1, downsample=None):
super(BasicBlock, self).__init__()
m = OrderedDict()
m['conv1'] = conv3x3(inplanes, planes, stride)
m['bn1'] = nn.BatchNorm2d(planes)
m['relu1'] = nn.ReLU(inplace=True)
m['conv2'] = conv3x3(planes, planes)
m['bn2'] = nn.BatchNorm2d(planes)
self.group1 = nn.Sequential(m)
self.relu= nn.Sequential(nn.ReLU(inplace=True))
self.downsample = downsample
def forward(self, x):
if self.downsample is not None:
residual = self.downsample(x)
else:
residual = x
out = self.group1(x) + residual
out = self.relu(out)
return out
class ResNet(nn.Module):
def __init__(self, block, layers, num_classes=10):
self.inplanes = 64
super(ResNet, self).__init__()
m = OrderedDict()
m['conv1'] = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
m['bn1'] = nn.BatchNorm2d(64)
m['relu1'] = nn.ReLU(inplace=True)
m['maxpool'] = nn.MaxPool2d(kernel_size=2, stride=2)
self.group1= nn.Sequential(m)
self.layer1 = self._make_layer(block, 64, layers[0])
self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
self.avgpool = nn.Sequential(nn.AvgPool2d(2))
self.attention = nn.Conv1d(1,1,kernel_size=3,padding=1)
self.group2 = nn.Sequential(
OrderedDict([
('fc', nn.Linear(512 * block.expansion, num_classes))
])
)
for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, np.sqrt(2. / n))
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
def _make_layer(self, block, planes, blocks, stride=1):
downsample = None
if stride != 1 or self.inplanes != planes * block.expansion:
downsample = nn.Sequential(
nn.Conv2d(self.inplanes, planes * block.expansion, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(planes * block.expansion),
)
layers = []
layers.append(block(self.inplanes, planes, stride, downsample))
self.inplanes = planes * block.expansion
for i in range(1, blocks):
layers.append(block(self.inplanes, planes))
return nn.Sequential(*layers)
def forward(self, x):
x = self.group1(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x) #(64,512,2,2)
x = self.avgpool(x) #(64,512,1,1)
x = x.view(x.size(0), -1) #(64,512)
# 添加一层channel attention
x = x.unsqueeze(1)
a = self.attention(x)
x = x.mul(a)
x = x.squeeze(1)
x = self.group2(x)
return x
def resnet18(pretrained=False, model_root=None, **kwargs):
model = ResNet(BasicBlock, [2, 2, 2, 2], **kwargs)
if pretrained:
misc.load_state_dict(model, model_urls['resnet18'], model_root)
return model
def test_MyCNN():
model = resnet18()
x = torch.zeros((64, 3, 32, 32), dtype=dtype) # minibatch size 64, image size [3, 32, 32]
scores = model(x)
print(scores.size())
# You should get at least 70% accuracy
model = resnet18()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-2,
momentum=0.9, nesterov=True)
print_every = 1000
train_part34(model, optimizer,epochs=10)
best_model = model
check_accuracy_part34(loader_test, best_model)
测试集上精度:
Checking accuracy on test set
Got 7664 / 10000 correct (76.64)
在使用 Pytorch 完成我们的深度学习工作的时候,大致遵循下列步骤:
torchvision.datasets
中已经定义好的数据集格式,或者自己定义一个数据集对象object
。同时,利用import torchvision.transforms
来对数据进行变换。from torch.utils.data import DataLoader
可以让你批量加载数据import torch.nn.functional as F
里的函数,以及nn
中的分钟带可学习参数的网络层来定义自己的网络结构以及连接nn.Module
;或者使用nn.Sequential
定义我们的前馈网络。model.eval()
转换测试模式,用with torch.no_grad()
声明计算图不积累梯度,然后输入测试数据计算测试指标(如损失、准确度等)torch.optim
中定义了各种优化器,我们需要指明参数,并分配给具体的模型可学习参数。nn.functioanl
中定义了很多损失函数)->原始梯度清零(用optimizer.zero_grad()
将以前积累的梯度清零)->损失反向传播(loss.backward())->参数更新(optimizer.step())