[CS231n Assignment 2 #05 ] 深度学习框架——Pytorch

  • 作业主页:Assignment 2
  • 官方示例代码: Assignment 2 code
  • 作业源文件 PyTorch.ipynb
  • 作业内容:
    这个作业有5个部分。您将在不同的抽象级别上学习PyTorch,这将帮助您更好地理解它,并为最终项目做好准备。
    1. Preparation: we will use CIFAR-10 dataset.
    2. Barebones PyTorch: we will work directly with the lowest-level PyTorch Tensors.
    3. PyTorch Module API: we will use nn.Module to define arbitrary neural network architecture.
    4. PyTorch Sequential API: we will use nn.Sequential to define a linear feed-forward network very conveniently.
    5.CIFAR-10 open-ended challenge: please implement your own network to get as high accuracy as possible on CIFAR-10. You can experiment with any layer, optimizer, hyperparameters or other advanced features.
API 灵活性 便捷性
Barebone
nn.Module
nn.Sequential

文章目录

        • 0. Pytorch
          • 0.1 简介
          • 0.2 入门
        • 1. 准备
        • 2. 最基础的Pytorch API(Barebones PyTorch)
          • 2.1 PyTorch Tensors: Flatten Function
          • 2.2 Barebones PyTorch: Two-Layer Network
          • 2.3 Barebones PyTorch: Three-Layer ConvNet
          • 2.4 Barebones PyTorch: Initialization
          • 2.5 Barebones PyTorch: Check Accuracy
          • 2.6 BareBones PyTorch: Training Loop
          • 2.7 BareBones PyTorch: Train a Two-Layer Network
          • 2.8 BareBones PyTorch: Training a ConvNet
        • 3.PyTorch Module API
          • 3.1 Module API: Two-Layer Network
          • 3.2 Module API: Three-Layer ConvNet
          • 3.3 Module API: Check Accuracy
          • 3.4 Module API: Training Loop
          • 3.5 Module API: Train a Two-Layer Network
          • 3.6 Module API: Train a Three-Layer ConvNet
        • 4.PyTorch Sequential API
          • 4.1 Sequential API: Two-Layer Network
          • 4.2 Sequential API: Three-Layer ConvNet
        • 5. CIFAR-10 open-ended challenge
          • 5.1 你能尝试的方向
          • 5.2 训练的技巧
          • 5.3 做的更好
          • 5.4 开始您的尝试
          • 5.5 测试集测试——只测试一次
        • 总结

0. Pytorch

0.1 简介

PyTorch是一个在**张量对象(tensor)**上执行动态计算图形的系统,这些张量对象的行为类似于numpy ndarray。它提供了一个强大的自动微分引擎,消除了手动反向传播(back-propagation)的需要。

0.2 入门

Justin Johnson 针对 Pytorch 分享了一个 tutorial ;
也可以在官方文档找到更详尽的内容 API doc;
遇到一些解决不了的问题,你可以去官方社区寻求帮助 PyTorch forum。

1. 准备

首先,我们下载 CIFAR-10 数据集,并利用 Pytorch 的模块来声明数据集,预处理数据以及生成 mini-batch。

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.utils.data import sampler

import torchvision.datasets as dset
import torchvision.transforms as T

import numpy as np

NUM_TRAIN = 49000

# The torchvision.transforms package provides tools for preprocessing data
# and for performing data augmentation; here we set up a transform to
# preprocess the data by subtracting the mean RGB value and dividing by the
# standard deviation of each RGB value; we've hardcoded the mean and std.
transform = T.Compose([
                T.ToTensor(),
                T.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
            ])

# We set up a Dataset object for each split (train / val / test); Datasets load
# training examples one at a time, so we wrap each Dataset in a DataLoader which
# iterates through the Dataset and forms minibatches. We divide the CIFAR-10
# training set into train and val sets by passing a Sampler object to the
# DataLoader telling how it should sample from the underlying Dataset.
cifar10_train = dset.CIFAR10('./cs231n/datasets', train=True, download=True,
                             transform=transform)
loader_train = DataLoader(cifar10_train, batch_size=64, 
                          sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN)))

cifar10_val = dset.CIFAR10('./cs231n/datasets', train=True, download=True,
                           transform=transform)
loader_val = DataLoader(cifar10_val, batch_size=64, 
                        sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN, 50000)))
                        
cifar10_test = dset.CIFAR10('./cs231n/datasets', train=False, 	
                             download=True, transform=transform)
loader_test = DataLoader(cifar10_test, batch_size=64)

然后,设置我们项目的全局 数据类型 以及 数据存储的设备torch.cuda.is_available() 返回我们的 pytorch 是否支持 GPU,然后 dtypedevice设置类型和设备。

USE_GPU = True

dtype = torch.float32 # we will be using float throughout this tutorial

if USE_GPU and torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')

# Constant to control how frequently we print train loss
print_every = 100

print('using device:', device)

2. 最基础的Pytorch API(Barebones PyTorch)

这节我们会先写一个简单的网络用于 CIFAR 数据集的分类,其只包含带ReLu激活的全连接层,而且只有两个 隐藏层。(我们会基于 Pytorch Tensor 完成前向传播,并借助 Pytorch 的自动求导机制 autograd 完成反向传播)。

  • 当我们声明 Pytorch Tensor 并带有参数 requires_grad=True 时,表明张量不只是计算值,也将在后台建立一个计算图,使我们可以很容易地通过图形反向传播来计算一些张量相对于下游损失的梯度
  • 具体来说,如果x是一个x.requires_grad == True的张量,然后反向传播后x.grad是另一个张量,它包含了x关于最后的标量损耗loss的梯度。
2.1 PyTorch Tensors: Flatten Function

PyTorch张量在概念上类似于numpy数组:它是一个n维数字网格.。与numpy一样,它提供了许多函数来有效地操作张量 Tensor。
作为一个简单的例子,我们提供了一个flatten函数,用于在一个全连接的神经网络中对图像数据进行拉伸变型
回想一下,图像数据通常存储在一个形状为 N x C x H x W N x C x H x W NxCxHxW 的张量中,因此,我们使用**“flatten”操作将每个表示为 C x H x W C x H x W CxHxW 的张量 折叠成一个长向量**。

  • 下面的 flatten 函数首先从给定的一批数据中读取N、C、H和W的值,然后返回该数据的view
  • view类似于numpy的reshape方法:它将x的维度重塑为 n × ? ? n \times?? n×??。其中? ?可以是任何东西,在这里,它是 C × H × W C \times H \times W C×H×W,但我们不需要明确地指定它,可以用 -1来自动求出。
def flatten(x):
    N = x.shape[0] # read in N, C, H, W
    return x.view(N, -1)  # "flatten" the C * H * W values into a single vector per image

def test_flatten():
    x = torch.arange(12).view(2, 1, 3, 2)
    print('Before flattening: ', x)
    print('After flattening: ', flatten(x))

test_flatten()

输出:

Before flattening:  tensor([[[[ 0,  1],
          [ 2,  3],
          [ 4,  5]]],


        [[[ 6,  7],
          [ 8,  9],
          [10, 11]]]])
After flattening:  tensor([[ 0,  1,  2,  3,  4,  5],
        [ 6,  7,  8,  9, 10, 11]])

可以看到x的维度从 2 ∗ 1 ∗ 3 ∗ 2 2 * 1 * 3 * 2 2132 变成了 2 ∗ 6 2*6 26

2.2 Barebones PyTorch: Two-Layer Network

在这里,我们定义了一个函数two_layer_fc,它对一批图像数据执行两层全连接ReLU网络的前向传递。在定义了向前传递之后,我们检查它是否崩溃,并通过在网络中输入0来检验输出的形状。

您不必在这里编写任何代码,但是阅读和理解实现是非常重要的

import torch.nn.functional as F  # useful stateless functions

def two_layer_fc(x, params):
    """
    A fully-connected neural networks; the architecture is:
    NN is fully connected -> ReLU -> fully connected layer.
    Note that this function only defines the forward pass; 
    PyTorch will take care of the backward pass for us.
    
    The input to the network will be a minibatch of data, of shape
    (N, d1, ..., dM) where d1 * ... * dM = D. The hidden layer will have H units,
    and the output layer will produce scores for C classes.
    
    Inputs:
    - x: A PyTorch Tensor of shape (N, d1, ..., dM) giving a minibatch of
      input data.
    - params: A list [w1, w2] of PyTorch Tensors giving weights for the network;
      w1 has shape (D, H) and w2 has shape (H, C).
    
    Returns:
    - scores: A PyTorch Tensor of shape (N, C) giving classification scores for
      the input data x.
    """
    # first we flatten the image
    x = flatten(x)  # shape: [batch_size, C x H x W]
    
    w1, w2 = params
    
    # Forward pass: compute predicted y using operations on Tensors. Since w1 and
    # w2 have requires_grad=True, operations involving these Tensors will cause
    # PyTorch to build a computational graph, allowing automatic computation of
    # gradients. Since we are no longer implementing the backward pass by hand we
    # don't need to keep references to intermediate values.  
    # you can also use `.clamp(min=0)`, equivalent to F.relu()
    x = F.relu(x.mm(w1))
    x = x.mm(w2)
    return x
    

def two_layer_fc_test():
    hidden_layer_size = 42
    x = torch.zeros((64, 50), dtype=dtype)  # minibatch size 64, feature dimension 50
    w1 = torch.zeros((50, hidden_layer_size), dtype=dtype)
    w2 = torch.zeros((hidden_layer_size, 10), dtype=dtype)
    scores = two_layer_fc(x, [w1, w2])
    print(scores.size())  # you should see [64, 10]

two_layer_fc_test()
torch.Size([64, 10])
2.3 Barebones PyTorch: Three-Layer ConvNet

在这里,您将完成函数 three_layer_convnet 的实现,该函数将执行一个三层卷积网络的前向传递。像上面一样,我们可以通过在网络中传递0来立即测试我们的实现。该网络应具有以下架构:

  • 一个卷积层(带偏置bias)带channel_1个卷积核,大小为 K W 1 × K H 1 KW_1 \times KH_1 KW1×KH1,且带有2个0填充;
  • 一个ReLU非线性激活层;
  • 一个卷积层(带偏置bias)带channel_2个卷积核,大小为 K W 2 × K H 2 KW_2 \times KH_2 KW2×KH2,且带有1个0填充;
  • 一个ReLu非线性激活层;
  • 带有偏置bias的全连接层,产生C类得分。

torch.nn.functional.conv2d文档
torch.nn.Conv2d( in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')
[CS231n Assignment 2 #05 ] 深度学习框架——Pytorch_第1张图片torch.nn.functional.conv2d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1) → Tensor
[CS231n Assignment 2 #05 ] 深度学习框架——Pytorch_第2张图片

def three_layer_convnet(x, params):
    """
    Performs the forward pass of a three-layer convolutional network with the
    architecture defined above.

    Inputs:
    - x: A PyTorch Tensor of shape (N, 3, H, W) giving a minibatch of images
    - params: A list of PyTorch Tensors giving the weights and biases for the
      network; should contain the following:
      - conv_w1: PyTorch Tensor of shape (channel_1, 3, KH1, KW1) giving weights
        for the first convolutional layer
      - conv_b1: PyTorch Tensor of shape (channel_1,) giving biases for the first
        convolutional layer
      - conv_w2: PyTorch Tensor of shape (channel_2, channel_1, KH2, KW2) giving
        weights for the second convolutional layer
      - conv_b2: PyTorch Tensor of shape (channel_2,) giving biases for the second
        convolutional layer
      - fc_w: PyTorch Tensor giving weights for the fully-connected layer. Can you
        figure out what the shape should be?
      - fc_b: PyTorch Tensor giving biases for the fully-connected layer. Can you
        figure out what the shape should be?
    
    Returns:
    - scores: PyTorch Tensor of shape (N, C) giving classification scores for x
    """
    conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b = params
    scores = None
    
    # TODO: Implement the forward pass for the three-layer ConvNet.                
    #torch.nn.functional.conv2d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1)#
    x = F.conv2d(x,conv_w1,bias=conv_b1,padding=2)
    x = F.relu(x)
    x = F.conv2d(x,conv_w2,bias=conv_b2,padding=1)
    x = flatten(x)
    # 添加映射层
    scores = x.mm(fc_w) + fc_b
    
    return scores
def three_layer_convnet_test():
    x = torch.zeros((64, 3, 32, 32), dtype=dtype)  # minibatch size 64, image size [3, 32, 32]

    conv_w1 = torch.zeros((6, 3, 5, 5), dtype=dtype)  # [out_channel, in_channel, kernel_H, kernel_W]
    conv_b1 = torch.zeros((6,))  # out_channel
    conv_w2 = torch.zeros((9, 6, 3, 3), dtype=dtype)  # [out_channel, in_channel, kernel_H, kernel_W]
    conv_b2 = torch.zeros((9,))  # out_channel

    # you must calculate the shape of the tensor after two conv layers, before the fully-connected layer
    fc_w = torch.zeros((9 * 32 * 32, 10))
    fc_b = torch.zeros(10)

    scores = three_layer_convnet(x, [conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b])
    print(scores.size())  # you should see [64, 10]
three_layer_convnet_test()

输出:torch.Size([64, 10])

2.4 Barebones PyTorch: Initialization

让我们写几个实用程序方法来初始化我们的模型的权重矩阵

  • random_weight(shape) initializes a weight tensor with the Kaiming normalization method.
  • zero_weight(shape) initializes a weight tensor with all zeros. Useful for instantiating bias parameters.
def random_weight(shape):
    """
    Create random Tensors for weights; setting requires_grad=True means that we
    want to compute gradients for these Tensors during the backward pass.
    We use Kaiming normalization: sqrt(2 / fan_in)
    """
    if len(shape) == 2:  # FC weight
        fan_in = shape[0]
    else:
        fan_in = np.prod(shape[1:]) # conv weight [out_channel, in_channel, kH, kW]
    # randn is standard normal distribution generator. 
    w = torch.randn(shape, device=device, dtype=dtype) * np.sqrt(2. / fan_in)
    w.requires_grad = True
    return w

def zero_weight(shape):
    return torch.zeros(shape, device=device, dtype=dtype, requires_grad=True)

# create a weight of shape [3 x 5]
# you should see the type `torch.cuda.FloatTensor` if you use GPU. 
# Otherwise it should be `torch.FloatTensor`
random_weight((3, 5))

输出:

tensor([[-0.3170,  1.1586,  0.2524, -0.0345,  0.0226],
        [ 0.3086,  1.2709,  0.4495, -1.0421, -0.3212],
        [ 0.8470,  1.1458,  0.4931, -0.3018, -0.4302]], device='cuda:0',
       requires_grad=True)
2.5 Barebones PyTorch: Check Accuracy

在对模型进行训练时,我们将使用以下函数来检查我们的模型在训练集或验证集上的准确性。

在检查精度时,我们不需要计算任何梯度;因此,在计算分数时,我们不需要PyTorch为我们构建计算图。为了防止构建图形,我们在torch.no_grad()上下文管理器下确定计算范围。

def check_accuracy_part2(loader, model_fn, params):
    """
    Check the accuracy of a classification model.
    
    Inputs:
    - loader: A DataLoader for the data split we want to check
    - model_fn: A function that performs the forward pass of the model,
      with the signature scores = model_fn(x, params)
    - params: List of PyTorch Tensors giving parameters of the model
    
    Returns: Nothing, but prints the accuracy of the model
    """
    split = 'val' if loader.dataset.train else 'test'
    print('Checking accuracy on the %s set' % split)
    num_correct, num_samples = 0, 0
    with torch.no_grad():
        for x, y in loader:
            x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
            y = y.to(device=device, dtype=torch.int64)
            scores = model_fn(x, params)
            _, preds = scores.max(1) # tensor.max()返回一个元组(最大值,最大值对应的索引)
            num_correct += (preds == y).sum()
            num_samples += preds.size(0)
        acc = float(num_correct) / num_samples
        print('Got %d / %d correct (%.2f%%)' % (num_correct, num_samples, 100 * acc))
2.6 BareBones PyTorch: Training Loop
def train_part2(model_fn, params, learning_rate):
    """
    Train a model on CIFAR-10.
    
    Inputs:
    - model_fn: A Python function that performs the forward pass of the model.
      It should have the signature scores = model_fn(x, params) where x is a
      PyTorch Tensor of image data, params is a list of PyTorch Tensors giving
      model weights, and scores is a PyTorch Tensor of shape (N, C) giving
      scores for the elements in x.
    - params: List of PyTorch Tensors giving weights for the model
    - learning_rate: Python scalar giving the learning rate to use for SGD
    
    Returns: Nothing
    """
    for t, (x, y) in enumerate(loader_train):
        # Move the data to the proper device (GPU or CPU)
        x = x.to(device=device, dtype=dtype)
        y = y.to(device=device, dtype=torch.long)

        # Forward pass: compute scores and loss
        scores = model_fn(x, params)
        loss = F.cross_entropy(scores, y)

        # Backward pass: PyTorch figures out which Tensors in the computational
        # graph has requires_grad=True and uses backpropagation to compute the
        # gradient of the loss with respect to these Tensors, and stores the
        # gradients in the .grad attribute of each Tensor.
        loss.backward()

        # Update parameters. We don't want to backpropagate through the
        # parameter updates, so we scope the updates under a torch.no_grad()
        # context manager to prevent a computational graph from being built.
        with torch.no_grad():
            for w in params:
                w -= learning_rate * w.grad

                # Manually zero the gradients after running the backward pass
                w.grad.zero_()

        if t % print_every == 0:
            print('Iteration %d, loss = %.4f' % (t, loss.item()))
            check_accuracy_part2(loader_val, model_fn, params)     
2.7 BareBones PyTorch: Train a Two-Layer Network
hidden_layer_size = 4000
learning_rate = 1e-2

w1 = random_weight((3 * 32 * 32, hidden_layer_size))
w2 = random_weight((hidden_layer_size, 10))

train_part2(two_layer_fc, [w1, w2], learning_rate)
# val set 上的准确度大概在 40%
Iteration 0, loss = 3.2187
Checking accuracy on the val set
Got 129 / 1000 correct (12.90%)

Iteration 100, loss = 2.0614
Checking accuracy on the val set
Got 325 / 1000 correct (32.50%)

Iteration 200, loss = 1.6769
Checking accuracy on the val set
Got 371 / 1000 correct (37.10%)

Iteration 300, loss = 1.9845
Checking accuracy on the val set
Got 394 / 1000 correct (39.40%)

Iteration 400, loss = 2.1362
Checking accuracy on the val set
Got 354 / 1000 correct (35.40%)

Iteration 500, loss = 1.7240
Checking accuracy on the val set
Got 442 / 1000 correct (44.20%)

Iteration 600, loss = 2.3258
Checking accuracy on the val set
Got 438 / 1000 correct (43.80%)

Iteration 700, loss = 1.9758
Checking accuracy on the val set
Got 430 / 1000 correct (43.00%)
2.8 BareBones PyTorch: Training a ConvNet
learning_rate = 3e-3

channel_1 = 32
channel_2 = 16

conv_w1 =  None
conv_b1 =  None
conv_w2 =  None
conv_b2 =  None
fc_w =  None
fc_b = None


# TODO: Initialize the parameters of a three-layer ConvNet.                    #
conv_w1 = random_weight((32,3,5,5))
conv_b1 = zero_weight((32,))
conv_w2 = random_weight((16,32,3,3))
conv_b2 = zero_weight((16,))
fc_w = random_weight((16 * 32 * 32,10))
fc_b = zero_weight(10)


params = [conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b]
train_part2(three_layer_convnet, params, learning_rate)
Iteration 0, loss = 4.2925
Checking accuracy on the val set
Got 141 / 1000 correct (14.10%)

Iteration 100, loss = 1.6818
Checking accuracy on the val set
Got 350 / 1000 correct (35.00%)

Iteration 200, loss = 1.9404
Checking accuracy on the val set
Got 450 / 1000 correct (45.00%)

Iteration 300, loss = 1.7681
Checking accuracy on the val set
Got 474 / 1000 correct (47.40%)

Iteration 400, loss = 1.8228
Checking accuracy on the val set
Got 460 / 1000 correct (46.00%)

Iteration 500, loss = 1.4132
Checking accuracy on the val set
Got 483 / 1000 correct (48.30%)

Iteration 600, loss = 1.5273
Checking accuracy on the val set
Got 496 / 1000 correct (49.60%)

Iteration 700, loss = 1.5708
Checking accuracy on the val set
Got 509 / 1000 correct (50.90%)

3.PyTorch Module API

Barebone PyTorch要求我们手动跟踪所有的参数张量。这对于只有几个张量的小型网络来说很好,但是在大型网络中跟踪几十个或几百个张量会非常不方便而且容易出错。
PyTorch提供了nn.Module帮助您定义任意的网络架构,同时为您跟踪每个可学习的参数。在第二部分中,我们自己实现了SGD。PyTorch还提供了torch.optim包,它实现了所有常见的优化器,比如RMSProp、Adagrad和Adam。它甚至支持近似的二阶方法,如L-BFGS。您可以参考文档doc以获得每个优化器的准确规范。

为了使用Module API,我们按照下列步骤:

  1. 继承nn.Module。给你的网络类一个直观的名字,比如TwoLayerFC。
  2. 在构造函数_init__()中,将需要的所有层定义为类属性
    层对象(Layer objects),比如nn.Linearnn.Conv2d本身就是nn.Module子类,并包含可学习的参数,因此您不必自己实例化原始张量。
    nn.Module将为您跟踪这些内部参数。参考文档doc以了解更多关于构建层的信息。
    警告: 不要忘记首先调用super(). init__() !
  3. forward()方法中,定义网络的连接性。你应该使用在_init__中定义的属性作为函数调用,以张量作为输入,输出“变换后的”张量。不要在forward()中创建任何带有可学习参数的新层!,所有这些层都必须在_init_里面声明。

定义了模块子类之后,可以将其实例化为对象并像第2部分中的NN前向函数那样调用它。

3.1 Module API: Two-Layer Network
class TwoLayerFC(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super().__init__()
        # assign layer objects to class attributes
        self.fc1 = nn.Linear(input_size, hidden_size)
        # nn.init package contains convenient initialization methods
        # http://pytorch.org/docs/master/nn.html#torch-nn-init 
        nn.init.kaiming_normal_(self.fc1.weight)
        self.fc2 = nn.Linear(hidden_size, num_classes)
        nn.init.kaiming_normal_(self.fc2.weight)
    
    def forward(self, x):
        # forward always defines connectivity
        x = flatten(x)
        scores = self.fc2(F.relu(self.fc1(x)))
        return scores

def test_TwoLayerFC():
    input_size = 50
    x = torch.zeros((64, input_size), dtype=dtype)  # minibatch size 64, feature dimension 50
    model = TwoLayerFC(input_size, 42, 10)
    scores = model(x)
    print(scores.size())  # you should see [64, 10]
test_TwoLayerFC()
3.2 Module API: Three-Layer ConvNet

It’s your turn to implement a 3-layer ConvNet followed by a fully connected layer. The network architecture should be the same as in Part II:

  1. Convolutional layer with channel_1 5x5 filters with zero-padding of 2
  2. ReLU
  3. Convolutional layer with channel_2 3x3 filters with zero-padding of 1
  4. ReLU
  5. Fully-connected layer to num_classes classes

You should initialize the weight matrices of the model using the Kaiming normal initialization method.

torch.nn.functional.conv2d文档
torch.nn.Conv2d( in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')
[CS231n Assignment 2 #05 ] 深度学习框架——Pytorch_第3张图片

class ThreeLayerConvNet(nn.Module):
    def __init__(self, in_channel, channel_1, channel_2, num_classes):
        super().__init__()
       
        # TODO: Set up the layers you need for a three-layer ConvNet with the  #
        # architecture defined above.                                          #
        self.conv1 = nn.Conv2d(in_channels=in_channel,out_channels=channel_1,kernel_size=5,padding=2, bias=True)
        nn.init.kaiming_normal_(self.conv1.weight)
        nn.init.constant_(self.conv1.bias,0)
        
        self.relu = F.relu
        
        self.conv2 = nn.Conv2d(in_channels=channel_1,out_channels=channel_2,kernel_size=3,padding=1,bias=True)
        nn.init.kaiming_normal_(self.conv2.weight)
        nn.init.constant_(self.conv2.bias,0)
        
        self.fc = nn.Linear(channel_2 * 32 * 32, num_classes)
     

    def forward(self, x):
        scores = None
        ########################################################################
        # TODO: Implement the forward function for a 3-layer ConvNet. you      #
        # should use the layers you defined in __init__ and specify the        #
        # connectivity of those layers in forward()                            #
        ########################################################################
        x = self.relu(self.conv1(x))
        x = self.relu(self.conv2(x))
        scores = self.fc(flatten(x))
        return scores


def test_ThreeLayerConvNet():
    x = torch.zeros((64, 3, 32, 32), dtype=dtype)  # minibatch size 64, image size [3, 32, 32]
    model = ThreeLayerConvNet(in_channel=3, channel_1=12, channel_2=8, num_classes=10)
    scores = model(x)
    print(scores.size())  # you should see [64, 10]
test_ThreeLayerConvNet()
3.3 Module API: Check Accuracy

给定验证或测试集,我们可以检查神经网络的分类精度。
这个版本与第二部分略有不同。您不再需要手动传递参数。

def check_accuracy_part34(loader, model):
    if loader.dataset.train:
        print('Checking accuracy on validation set')
    else:
        print('Checking accuracy on test set')   
    num_correct = 0
    num_samples = 0
    model.eval()  # set model to evaluation mode
    with torch.no_grad():
        for x, y in loader:
            x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
            y = y.to(device=device, dtype=torch.long)
            scores = model(x)
            _, preds = scores.max(1)
            num_correct += (preds == y).sum()
            num_samples += preds.size(0)
        acc = float(num_correct) / num_samples
        print('Got %d / %d correct (%.2f)' % (num_correct, num_samples, 100 * acc))

注意:

  • 设置模型为评测模式model.eval(),此时不会记录batchnorm的训练均值等;
  • 使用with torch.no_grad()不会记录训练梯度;
3.4 Module API: Training Loop

我们还使用了一个稍微不同的训练循环。我们使用来自torch.optim的优化器对象,而不是自己更新权重的值。它抽象了优化算法的概念,并提供了通常用于优化神经网络的大多数算法的实现。

def train_part34(model, optimizer, epochs=1):
    """
    Train a model on CIFAR-10 using the PyTorch Module API.
    
    Inputs:
    - model: A PyTorch Module giving the model to train.
    - optimizer: An Optimizer object we will use to train the model
    - epochs: (Optional) A Python integer giving the number of epochs to train for
    
    Returns: Nothing, but prints model accuracies during training.
    """
    model = model.to(device=device)  # move the model parameters to CPU/GPU
    for e in range(epochs):
        for t, (x, y) in enumerate(loader_train):
            model.train()  # put model to training mode
            x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
            y = y.to(device=device, dtype=torch.long)

            scores = model(x)
            loss = F.cross_entropy(scores, y)

            # Zero out all of the gradients for the variables which the optimizer
            # will update.
            optimizer.zero_grad()

            # This is the backwards pass: compute the gradient of the loss with
            # respect to each  parameter of the model.
            loss.backward()

            # Actually update the parameters of the model using the gradients
            # computed by the backwards pass.
            optimizer.step()

            if t % print_every == 0:
                print('Iteration %d, loss = %.4f' % (t, loss.item()))
                check_accuracy_part34(loader_val, model)
                print()
3.5 Module API: Train a Two-Layer Network

现在我们准备运行训练循环。与第二部分不同,我们不再显式地分配参数张量
只需将输入大小、隐藏层大小和类的数量(即输出大小)传递给TwoLayerFC的构造函数。
您还需要定义一个优化器来跟踪TwoLayerFC中的所有可学习参数
您不需要调整任何超参数,但是经过一个epoch的训练后,您应该可以看到模型精度超过40%。

hidden_layer_size = 4000
learning_rate = 1e-2
model = TwoLayerFC(3 * 32 * 32, hidden_layer_size, 10)
optimizer = optim.SGD(model.parameters(), lr=learning_rate)

train_part34(model, optimizer)

最终精度在43%~45%左右。

3.6 Module API: Train a Three-Layer ConvNet

现在您应该使用Module API在CIFAR上训练一个三层的ConvNet。这应该看起来很像训练两层网络!您不需要调整任何超参数,但是经过一段时间的训练后,您应该可以达到45%以上。你应该使用没有动量的随机梯度下降来训练模型。

learning_rate = 3e-3
channel_1 = 32
channel_2 = 16

model = None
optimizer = None
# TODO: Instantiate your ThreeLayerConvNet model and a corresponding optimizer #
model = ThreeLayerConvNet(3, channel_1, channel_2, 10)
optimizer = torch.optim.SGD(model.parameters(),lr=learning_rate)

train_part34(model, optimizer)

最终精度在48%左右。

4.PyTorch Sequential API

第三部分介绍了PyTorch Module API,它允许您定义任意可学习的层及其连接性。

对于像前馈层堆叠这样的简单模型,仍然需要经过3步:继承nn.Module类000,在_init__中将网络层声明成类属性,并在forward()方法中逐个调用每一层。

有没有更方便的方法?

幸运的是,PyTorch提供了一个名为nn.Sequentiao容器模块,它将上述步骤合并为一个。它不像nn.Module`那么灵活。模块,因为您不能指定比前馈堆栈更复杂的拓扑结构,但是它对于许多用例来说已经足够了。

4.1 Sequential API: Two-Layer Network
# We need to wrap `flatten` function in a module in order to stack it
# in nn.Sequential
class Flatten(nn.Module):
    def forward(self, x):
        return flatten(x)

hidden_layer_size = 4000
learning_rate = 1e-2

model = nn.Sequential(
    Flatten(),
    nn.Linear(3 * 32 * 32, hidden_layer_size),
    nn.ReLU(),
    nn.Linear(hidden_layer_size, 10),
)

# you can use Nesterov momentum in optim.SGD
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate,
                     momentum=0.9, nesterov=True)

train_part34(model, optimizer)
4.2 Sequential API: Three-Layer ConvNet
channel_1 = 32
channel_2 = 16
learning_rate = 1e-2

model = None
optimizer = None


# TODO: Rewrite the 2-layer ConvNet with bias from Part III with the           #
# Sequential API.                                                              #
model = nn.Sequential(
    nn.Conv2d(3,channel_1,kernel_size=5,padding=2),
    nn.ReLU(),
    nn.Conv2d(channel_1,channel_2,kernel_size=3,padding=1),
    nn.ReLU(),
    Flatten(),
    nn.Linear(channel_2*32*32,10)
)
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate,
                     momentum=0.9, nesterov=True)

train_part34(model, optimizer)

5. CIFAR-10 open-ended challenge

在本节中,您可以在CIFAR-10上试验任何您想要的ConvNet架构。

现在,您的工作是对 体系结构、超参数、损失函数和优化器 进行试验,以训练在10个epoch内在CIFAR-10验证集上获得至少70%的准确率的模型。您可以使用上面的check_accuracytrain函数。你可以使用任何一个nn.Modulenn.Sequential的API。

以下是每个组件的官方API文档。注意:我们在Pytorch中将“spatial batch norm”称为“BatchNorm2D”。

  • Layers in torch.nn package: http://pytorch.org/docs/stable/nn.html
  • Activations: http://pytorch.org/docs/stable/nn.html#non-linear-activations
  • Loss functions:http://pytorch.org/docs/stable/nn.html#loss-functions
  • Optimizers: http://pytorch.org/docs/stable/optim.html
5.1 你能尝试的方向
  • 卷积核大小(Filter size):我们之前用的都是 5x5,是否更小的核更有效?
  • 卷积核数目(Number of filters):上面我们使用了32个过滤器。多做还是少做更好?
  • 池化(Pooling)还是步长卷积(Strided Convolution):你是使用最大池化还是仅仅使用跨步卷积?
  • 批量归一化(Batch normalization):尝试在卷积层之后添加空间批处理归一化,在仿射层之后添加普通批处理归一化。你的网络训练得更快吗?
  • 网络结构(Network architecture):上面的网络有两层可训练的参数。你能用深层网络做得更好吗?好的架构包括:
    • [conv-relu-pool]xN -> [affine]xM -> [softmax or SVM]
    • [conv-relu-conv-relu-pool]xN -> [affine]xM -> [softmax or SVM]
    • [batchnorm-relu-conv]xN -> [affine]xM -> [softmax or SVM]
  • 全局平均池化(Global Average Pooling):不要先进行拉伸(flatten),然后再使用多个仿射层,而是执行卷积,直到图像变小(7x7左右),然后执行平均池操作,得到1x1图像图像(1,1,Filter#),然后将其重新构造为(Filter#)向量。这个方法被用在了 Google’s Inception Network (See Table 1 for their architecture).
5.2 训练的技巧

对于您尝试的每个网络体系结构,您都应该调整学习率和其他超参数。当你这样做的时候,有几件重要的事情要记住:

  • 如果参数工作得很好,您应该可以在几百次迭代中看到改进;
  • 请记住超参数调优的由粗到细的方法:首先测试大范围的超参数,只需要几个训练迭代,就可以找到有效的参数组合。
  • 一旦您找到了一些似乎可以工作的参数集,就可以更细致地搜索这些参数。你可能需要为更多的时代而训练。
  • 您应该使用验证集来进行超参数搜索,并保存您的测试集,以便根据验证集所选择的最佳参数来评估您的体系结构。
5.3 做的更好

如果您喜欢冒险,您可以实现许多其他特性来尝试改进性能。您不需要实现其中的任何一个,但是如果您有时间,请不要错过其中的乐趣!

  • 优化器(Alternative optimizers): you can try Adam, Adagrad, RMSprop, etc.
  • 激活函数:alternative activation functions such as leaky ReLU, parametric ReLU, ELU, or MaxOut.
  • 模型集成(Model ensembles)
  • 数据增强(Data augmentation)
  • 新的网络结构:
    • ResNets where the input from the previous layer is added to the output.
    • DenseNets where inputs into previous layers are concatenated together.
    • This blog has an in-depth overview
5.4 开始您的尝试

我试了一下修改的ResNet18网络,在第3个epoch开始能得到75.4%的准确度,最高78%的准确度(验证集)。

################################################################################
# TODO:                                                                        #         
# Experiment with any architectures, optimizers, and hyperparameters.          #
# Achieve AT LEAST 70% accuracy on the *validation set* within 10 epochs.      #
#                                                                              #
# Note that you can use the check_accuracy function to evaluate on either      #
# the test set or the validation set, by passing either loader_test or         #
# loader_val as the second argument to check_accuracy. You should not touch    #
# the test set until you have finished your architecture and  hyperparameter   #
# tuning, and only run the test set once at the end to report a final value.   #
################################################################################
from collections import OrderedDict
model = None
optimizer = None

def conv3x3(in_planes,out_planes,stride = 1):
    # "3x3 convolution with padding"
    return nn.Conv2d(
        in_planes, out_planes, 
        kernel_size=3, stride=stride, padding=1, bias=False)
class BasicBlock(nn.Module):
    expansion = 1
    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        m = OrderedDict()
        m['conv1'] = conv3x3(inplanes, planes, stride)
        m['bn1'] = nn.BatchNorm2d(planes)
        m['relu1'] = nn.ReLU(inplace=True)
        m['conv2'] = conv3x3(planes, planes)
        m['bn2'] = nn.BatchNorm2d(planes)
        self.group1 = nn.Sequential(m)

        self.relu= nn.Sequential(nn.ReLU(inplace=True))
        self.downsample = downsample

    def forward(self, x):
        if self.downsample is not None:
            residual = self.downsample(x)
        else:
            residual = x

        out = self.group1(x) + residual

        out = self.relu(out)

        return out

class ResNet(nn.Module):
    def __init__(self, block, layers, num_classes=10):
        self.inplanes = 64
        super(ResNet, self).__init__()

        m = OrderedDict()
        m['conv1'] = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
        m['bn1'] = nn.BatchNorm2d(64)
        m['relu1'] = nn.ReLU(inplace=True)
        m['maxpool'] = nn.MaxPool2d(kernel_size=2, stride=2)
        self.group1= nn.Sequential(m)

        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)

        self.avgpool = nn.Sequential(nn.AvgPool2d(2))
        self.attention = nn.Conv1d(1,1,kernel_size=3,padding=1)
        self.group2 = nn.Sequential(
            OrderedDict([
                ('fc', nn.Linear(512 * block.expansion, num_classes))
            ])
        )
        
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, np.sqrt(2. / n))
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()

    def _make_layer(self, block, planes, blocks, stride=1):
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.inplanes, planes * block.expansion, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(planes * block.expansion),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))
        self.inplanes = planes * block.expansion
        for i in range(1, blocks):
            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.group1(x)
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)  #(64,512,2,2)
        x = self.avgpool(x) #(64,512,1,1)
        x = x.view(x.size(0), -1) #(64,512)
        # 添加一层channel attention
        x = x.unsqueeze(1)
        a = self.attention(x)
        x = x.mul(a)
        x = x.squeeze(1)
        x = self.group2(x)

        return x


def resnet18(pretrained=False, model_root=None, **kwargs):
    model = ResNet(BasicBlock, [2, 2, 2, 2], **kwargs)
    if pretrained:
        misc.load_state_dict(model, model_urls['resnet18'], model_root)
    return model
def test_MyCNN():
    model = resnet18()
    x = torch.zeros((64, 3, 32, 32), dtype=dtype)  # minibatch size 64, image size [3, 32, 32]
    scores = model(x)
    print(scores.size())
# You should get at least 70% accuracy
model = resnet18()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-2,
                     momentum=0.9, nesterov=True)
print_every = 1000
train_part34(model, optimizer,epochs=10)
5.5 测试集测试——只测试一次
best_model = model
check_accuracy_part34(loader_test, best_model)

测试集上精度:

Checking accuracy on test set
Got 7664 / 10000 correct (76.64)

总结

在使用 Pytorch 完成我们的深度学习工作的时候,大致遵循下列步骤:

  • 定义你的数据集
    你可以使用torchvision.datasets中已经定义好的数据集格式,或者自己定义一个数据集对象object。同时,利用import torchvision.transforms来对数据进行变换。
  • 定义你的加载器
    from torch.utils.data import DataLoader可以让你批量加载数据
  • 定义你的模型
    使用import torch.nn.functional as F里的函数,以及nn中的分钟带可学习参数的网络层来定义自己的网络结构以及连接nn.Module;或者使用nn.Sequential定义我们的前馈网络。
  • 定义你的测试方法
    使用model.eval()转换测试模式,用with torch.no_grad()声明计算图不积累梯度,然后输入测试数据计算测试指标(如损失、准确度等)
  • 定义优化器
    torch.optim中定义了各种优化器,我们需要指明参数,并分配给具体的模型可学习参数。
  • 定义训练函数
    加载模型->输入训练数据(前向传播 model(x))->计算损失loss(nn.functioanl中定义了很多损失函数)->原始梯度清零(用optimizer.zero_grad()将以前积累的梯度清零)->损失反向传播(loss.backward())->参数更新(optimizer.step())

你可能感兴趣的:(#,CS231n)