Pytorch Tutorial

1. TENSORS

Tensors是一种特殊的数据结构,非常类似于数组和矩阵。在PyTorch中,我们使用Tensors来编码模型的输入和输出,以及模型的参数。

Tensors类似于NumPy的 ndarray,除了Tensors可以在GPU或其他硬件加速器上运行。事实上,Tensors和NumPy数组通常可以共享相同的底层内存,从而消除了复制数据。Tensors也被优化为自动微分。如果您熟悉ndarrays,那么您对Tensor API就很熟悉了。

import torch
import numpy as np

Initializing a Tensor

Tensors can be initialized in various ways. Take a look at the following examples:

Directly from data

Tensors can be created directly from data. The data type is automatically inferred.

data = [[1, 2],[3, 4]]
x_data = torch.tensor(data)

From a NumPy array

Tensors can be created from NumPy arrays.

np_array = np.array(data)
x_np = torch.from_numpy(np_array)

From another tensor:

The new tensor retains the properties (shape, datatype) of the argument tensor, unless explicitly overridden.

x_ones = torch.ones_like(x_data) # retains the properties of x_data
print(f"Ones Tensor: \n {x_ones} \n")

x_rand = torch.rand_like(x_data, dtype=torch.float) # overrides the datatype of x_data
print(f"Random Tensor: \n {x_rand} \n")

Out:

Ones Tensor:
 tensor([[1, 1],
        [1, 1]])

Random Tensor:
 tensor([[0.9802, 0.0761],
        [0.8980, 0.4541]])

With random or constant values:

shape is a tuple of tensor dimensions. In the functions below, it determines the dimensionality of the output tensor.

shape = (2,3,)
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)

print(f"Random Tensor: \n {rand_tensor} \n")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Zeros Tensor: \n {zeros_tensor}")

Out:

Random Tensor:
 tensor([[0.1383, 0.0385, 0.0745],
        [0.1842, 0.2020, 0.1991]])

Ones Tensor:
 tensor([[1., 1., 1.],
        [1., 1., 1.]])

Zeros Tensor:
 tensor([[0., 0., 0.],
        [0., 0., 0.]])

Attributes of a Tensor

Tensor attributes describe their shape, datatype, and the device on which they are stored.

tensor = torch.rand(3,4)

print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")

Out:

Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu

Operations on Tensors

超过100个tensor 操作,包括算术,线性代数,矩阵操作(转置,索引,切片),采样和更多的全面描述这里。

这些操作都可以在GPU上运行(速度通常高于CPU)。如果你使用的是Colab,可以在Runtime >更改运行时类型>GPU。

默认情况下,张量是在CPU上创建的。我们需要明确地将张量移动到GPU中。方法(在检查GPU可用性之后)。请记住,跨设备复制大型张量在时间和内存方面是昂贵的!

# We move our tensor to the GPU if available
if torch.cuda.is_available():
  tensor = tensor.to('cuda')

Try out some of the operations from the list. If you’re familiar with the NumPy API, you’ll find the Tensor API a breeze to use.

Standard numpy-like indexing and slicing:

tensor = torch.ones(4, 4)
print('First row: ',tensor[0])
print('First column: ', tensor[:, 0])
print('Last column:', tensor[..., -1])
tensor[:,1] = 0
print(tensor)

Out:

First row:  tensor([1., 1., 1., 1.])
First column:  tensor([1., 1., 1., 1.])
Last column: tensor([1., 1., 1., 1.])
tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])

Joining tensors You can use torch.cat to concatenate a sequence of tensors along a given dimension. See also torch.stack, another tensor joining op that is subtly different from torch.cat.

# 4*4合并
t1 = torch.cat([tensor, tensor, tensor], dim=1)
print(t1)

Out:

tensor([[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.]])

Arithmetic operations

# This computes the matrix multiplication between two tensors. y1, y2, y3 will have the same value
y1 = tensor @ tensor.T
y2 = tensor.matmul(tensor.T)

y3 = torch.rand_like(tensor)
torch.matmul(tensor, tensor.T, out=y3)


# This computes the element-wise product. z1, z2, z3 will have the same value
z1 = tensor * tensor
z2 = tensor.mul(tensor)

z3 = torch.rand_like(tensor)
torch.mul(tensor, tensor, out=z3)

Single-element tensors如果你有一个one-element tensor,例如通过将一个tensor 的所有值聚合为一个值,你可以使用' item() '将它转换为一个Python数值

agg = tensor.sum()
agg_item = agg.item()
print(agg_item, type(agg_item))

Out:

12.0 

In-place operations Operations that store the result into the operand are called in-place. They are denoted by a _ suffix. For example: x.copy_(y), x.t_(), will change x.

print(tensor, "\n")
tensor.add_(5)
print(tensor)

Out:

tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])

tensor([[6., 5., 6., 6.],
        [6., 5., 6., 6.],
        [6., 5., 6., 6.],
        [6., 5., 6., 6.]])

NOTE

In-place operations save some memory, but can be problematic when computing derivatives because of an immediate loss of history. Hence, their use is discouraged.

Bridge with NumPy

Tensors on the CPU and NumPy arrays can share their underlying memory locations, and changing one will change the other.

Tensor to NumPy array

t = torch.ones(5)
print(f"t: {t}")
n = t.numpy()
print(f"n: {n}")

Out:

t: tensor([1., 1., 1., 1., 1.])
n: [1. 1. 1. 1. 1.]

A change in the tensor reflects in the NumPy array.

t.add_(1)
print(f"t: {t}")
print(f"n: {n}")

Out:

t: tensor([2., 2., 2., 2., 2.])
n: [2. 2. 2. 2. 2.]

NumPy array to Tensor

n = np.ones(5)
t = torch.from_numpy(n)

Changes in the NumPy array reflects in the tensor.

np.add(n, 1, out=n)
print(f"t: {t}")
print(f"n: {n}")

Out:

t: tensor([2., 2., 2., 2., 2.], dtype=torch.float64)
n: [2. 2. 2. 2. 2.]

Total running time of the script: ( 0 minutes 6.199 seconds)

2. DATASETS & DATALOADERS

处理数据样本的代码可能会变得混乱且难以维护;理想情况下,我们希望数据集代码与模型训练代码解耦,以获得更好的可读性和模块化。PyTorch提供了两个数据原语:torch.utils.data.DataLoader and torch.utils.data.Dataset。允许您使用预加载的数据集以及您自己的数据。 Dataset存储示例及其相应的标签,DataLoaderDataset周围包装一个可迭代对象以方便访问示例。

PyTorch域库提供了许多预加载的数据集(比如FashionMNIST),这些数据集的子类的torch.utils.data.Dataset ,并实现特定于特定数据的函数。它们可以用于模型原型和基准测试。你可以在这里找到它们: Image Datasets, Text Datasets, and Audio Datasets

Loading a Dataset

Here is an example of how to load the Fashion-MNIST dataset from TorchVision. Fashion-MNIST is a dataset of Zalando’s article images consisting of 60,000 training examples and 10,000 test examples. Each example comprises a 28×28 grayscale image and an associated label from one of 10 classes.

We load the FashionMNIST Dataset with the following parameters:

  • root is the path where the train/test data is stored,
  • train specifies training or test dataset,
  • download=True downloads the data from the internet if it’s not available at root.
  • transform and target_transform specify the feature and label transformations
import torch
from torch.utils.data import Dataset
from torchvision import datasets
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt


training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

Iterating and Visualizing the Dataset

We can index Datasets manually like a list: training_data[index]. We use matplotlib to visualize some samples in our training data.

labels_map = {
    0: "T-Shirt",
    1: "Trouser",
    2: "Pullover",
    3: "Dress",
    4: "Coat",
    5: "Sandal",
    6: "Shirt",
    7: "Sneaker",
    8: "Bag",
    9: "Ankle Boot",
}
figure = plt.figure(figsize=(8, 8))
cols, rows = 3, 3
for i in range(1, cols * rows + 1):
    sample_idx = torch.randint(len(training_data), size=(1,)).item()
    img, label = training_data[sample_idx]
    figure.add_subplot(rows, cols, i)
    plt.title(labels_map[label])
    plt.axis("off")
    plt.imshow(img.squeeze(), cmap="gray")
plt.show()

Creating a Custom Dataset for your files

A custom Dataset class must implement three functions: init, len, and getitem. Take a look at this implementation; the FashionMNIST images are stored in a directory img_dir, and their labels are stored separately in a CSV file annotations_file.

In the next sections, we’ll break down what’s happening in each of these functions.

import os
import pandas as pd
from torchvision.io import read_image

class CustomImageDataset(Dataset):
    def __init__(self, annotations_file, img_dir, transform=None, target_transform=None):
        self.img_labels = pd.read_csv(annotations_file)
        self.img_dir = img_dir
        self.transform = transform
        self.target_transform = target_transform

    def __len__(self):
        return len(self.img_labels)

    def __getitem__(self, idx):
        img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0])
        image = read_image(img_path)
        label = self.img_labels.iloc[idx, 1]
        if self.transform:
            image = self.transform(image)
        if self.target_transform:
            label = self.target_transform(label)
        return image, label

_init_

The init function is run once when instantiating the Dataset object. We initialize the directory containing the images, the annotations file, and both transforms (covered in more detail in the next section).

The labels.csv file looks like:

tshirt1.jpg, 0
tshirt2.jpg, 0
......
ankleboot999.jpg, 9
def __init__(self, annotations_file, img_dir, transform=None, target_transform=None):
    self.img_labels = pd.read_csv(annotations_file)
    self.img_dir = img_dir
    self.transform = transform
    self.target_transform = target_transform

_len_

The len function returns the number of samples in our dataset.

Example:

def __len__(self):
    return len(self.img_labels)

_getitem_

The getitem function loads and returns a sample from the dataset at the given index idx. Based on the index, it identifies the image’s location on disk, converts that to a tensor using read_image, retrieves the corresponding label from the csv data in self.img_labels, calls the transform functions on them (if applicable), and returns the tensor image and corresponding label in a tuple.

def __getitem__(self, idx):
    img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0])
    image = read_image(img_path)
    label = self.img_labels.iloc[idx, 1]
    if self.transform:
        image = self.transform(image)
    if self.target_transform:
        label = self.target_transform(label)
    return image, label

Preparing your data for training with DataLoaders

The Dataset retrieves our dataset’s features and labels one sample at a time. While training a model, we typically want to pass samples in “minibatches”, reshuffle the data at every epoch to reduce model overfitting, and use Python’s multiprocessing to speed up data retrieval.

DataLoader is an iterable that abstracts this complexity for us in an easy API.

from torch.utils.data import DataLoader

train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=64, shuffle=True)

Iterate through the DataLoader

We have loaded that dataset into the Dataloader and can iterate through the dataset as needed. Each iteration below returns a batch of train_features and train_labels (containing batch_size=64 features and labels respectively). Because we specified shuffle=True, after we iterate over all batches the data is shuffled (for finer-grained control over the data loading order, take a look at Samplers).

# Display image and label.
train_features, train_labels = next(iter(train_dataloader))
print(f"Feature batch shape: {train_features.size()}")
print(f"Labels batch shape: {train_labels.size()}")
img = train_features[0].squeeze()
label = train_labels[0]
plt.imshow(img, cmap="gray")
plt.show()
print(f"Label: {label}")

Out:

Feature batch shape: torch.Size([64, 1, 28, 28])
Labels batch shape: torch.Size([64])
Label: 0

3. TRANSFORMS

数据并不总是以训练机器学习算法所需的最终处理形式出现。我们使用transforms来执行一些数据操作,使其适合于训练。

所有TorchVision数据集都有两个参数—用于修改特性的transform和用于修改标签的target_transform—它们接受包含转换逻辑的可调用对象。torchvision.transforms模块提供了几种常用的开箱即用的转换。

The FashionMNIST features are in PIL Image format, and the labels are integers. For training, we need the features as normalized tensors, and the labels as one-hot encoded tensors. To make these transformations, we use ToTensor and Lambda.

from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda

ds = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
    target_transform=Lambda(lambda y: torch.zeros(10, dtype=torch.float).scatter_(0, torch.tensor(y), value=1))
)

ToTensor()

ToTensor converts a PIL image or NumPy ndarray into a FloatTensor. and scales the image’s pixel intensity values in the range [0., 1.]

Lambda Transforms

Lambda transforms apply any user-defined lambda function. Here, we define a function to turn the integer into a one-hot encoded tensor. It first creates a zero tensor of size 10 (the number of labels in our dataset) and calls scatter_ which assigns a value=1 on the index as given by the label y.

target_transform = Lambda(lambda y: torch.zeros(
    10, dtype=torch.float).scatter_(dim=0, index=torch.tensor(y), value=1))

4. BUILD THE NEURAL NETWORK

Neural networks comprise of layers/modules that perform operations on data. The torch.nn namespace provides all the building blocks you need to build your own neural network. Every module in PyTorch subclasses the nn.Module. A neural network is a module itself that consists of other modules (layers). This nested structure allows for building and managing complex architectures easily.

In the following sections, we’ll build a neural network to classify images in the FashionMNIST dataset.

神经网络由对数据执行操作的层/模块组成。该torch.nn命名空间提供了所有你需要建立自己的神经网络的基石。在PyTorch每个模块的子类nn.Module。神经网络是一个模块本身,由其他模块(层)组成。这种嵌套结构允许轻松构建和管理复杂的架构。

在以下部分中,我们将构建一个神经网络来对 FashionMNIST 数据集中的图像进行分类。

import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

Get Device for Training

We want to be able to train our model on a hardware accelerator like the GPU, if it is available. Let’s check to see if torch.cuda is available, else we continue to use the CPU.

我们希望能够在 GPU 等硬件加速器上训练我们的模型(如果可用)。让我们检查一下 torch.cuda是否可用,否则我们继续使用CPU。

device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('Using {} device'.format(device))

Out:

Using cuda device

Define the Class

We define our neural network by subclassing nn.Module, and initialize the neural network layers in __init__. Every nn.Module subclass implements the operations on input data in the forward method.

我们通过子类化定义我们的神经网络nn.Module,并在 中初始化神经网络层__init__。每个nn.Module子类都在forward方法中实现对输入数据的操作。

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
            nn.ReLU()
        )

    def forward(self, x):
        # 一维展开
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

We create an instance of NeuralNetwork, and move it to the device, and print its structure.

model = NeuralNetwork().to(device)
print(model)

Out:

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
    (5): ReLU()
  )
)

To use the model, we pass it the input data. This executes the model’s forward, along with some background operations. Do not call model.forward() directly!

Calling the model on the input returns a 10-dimensional tensor with raw predicted values for each class. We get the prediction probabilities by passing it through an instance of the nn.Softmax module.

为了使用模型,我们将输入数据传递给它。这将执行模型的forward以及一些后台操作。不要直接调用model.forward()

在输入上调用模型会返回一个 10 维张量,其中包含每个类的原始预测值。我们通过将其传递给nn.Softmax模块的实例来获得预测概率。

X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

Out:

Predicted class: tensor([2], device='cuda:0')

Model Layers

Let’s break down the layers in the FashionMNIST model. To illustrate it, we will take a sample minibatch of 3 images of size 28x28 and see what happens to it as we pass it through the network.

input_image = torch.rand(3,28,28)
print(input_image.size())

Out:

torch.Size([3, 28, 28])

nn.Flatten

We initialize the nn.Flatten layer to convert each 2D 28x28 image into a contiguous array of 784 pixel values ( the minibatch dimension (at dim=0) is maintained).

flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())

Out:

torch.Size([3, 784])

nn.Linear

The linear layer is a module that applies a linear transformation on the input using its stored weights and biases.

layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size())

Out:

torch.Size([3, 20])

nn.ReLU

Non-linear activations are what create the complex mappings between the model’s inputs and outputs. They are applied after linear transformations to introduce nonlinearity, helping neural networks learn a wide variety of phenomena.

In this model, we use nn.ReLU between our linear layers, but there’s other activations to introduce non-linearity in your model.

print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")

nn.Sequential

nn.Sequential is an ordered container of modules. The data is passed through all the modules in the same order as defined. You can use sequential containers to put together a quick network like seq_modules.

seq_modules = nn.Sequential(
    flatten,
    layer1,
    nn.ReLU(),
    nn.Linear(20, 10)
)
input_image = torch.rand(3,28,28)
logits = seq_modules(input_image)

nn.Softmax

The last linear layer of the neural network returns logits - raw values in [-infty, infty] - which are passed to the nn.Softmax module. The logits are scaled to values [0, 1] representing the model’s predicted probabilities for each class. dim parameter indicates the dimension along which the values must sum to 1.

softmax = nn.Softmax(dim=1)
pred_probab = softmax(logits)

Model Parameters

Many layers inside a neural network are parameterized, i.e. have associated weights and biases that are optimized during training. Subclassing nn.Module automatically tracks all fields defined inside your model object, and makes all parameters accessible using your model’s parameters() or named_parameters() methods.

In this example, we iterate over each parameter, and print its size and a preview of its values.

print("Model structure: ", model, "\n\n")

for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

Out:

Model structure:  NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
    (5): ReLU()
  )
)


Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : tensor([[-0.0153, -0.0177,  0.0286,  ..., -0.0073, -0.0272,  0.0314],
        [ 0.0067, -0.0347,  0.0343,  ...,  0.0347, -0.0196, -0.0094]],
       device='cuda:0', grad_fn=)

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : tensor([ 0.0118, -0.0279], device='cuda:0', grad_fn=)

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values : tensor([[-0.0319, -0.0182, -0.0130,  ..., -0.0155, -0.0372, -0.0199],
        [-0.0051,  0.0356,  0.0397,  ...,  0.0419, -0.0151,  0.0283]],
       device='cuda:0', grad_fn=)

Layer: linear_relu_stack.2.bias | Size: torch.Size([512]) | Values : tensor([-0.0041, -0.0237], device='cuda:0', grad_fn=)

Layer: linear_relu_stack.4.weight | Size: torch.Size([10, 512]) | Values : tensor([[-0.0028,  0.0333,  0.0018,  ..., -0.0088, -0.0022, -0.0389],
        [-0.0091,  0.0066, -0.0125,  ..., -0.0255,  0.0282,  0.0056]],
       device='cuda:0', grad_fn=)

Layer: linear_relu_stack.4.bias | Size: torch.Size([10]) | Values : tensor([0.0201, 0.0236], device='cuda:0', grad_fn=)

5. AUTOMATIC DIFFERENTIATION

When training neural networks, the most frequently used algorithm is back propagation. In this algorithm, parameters (model weights) are adjusted according to the gradient of the loss function with respect to the given parameter.

To compute those gradients, PyTorch has a built-in differentiation engine called torch.autograd. It supports automatic computation of gradient for any computational graph.

Consider the simplest one-layer neural network, with input x, parameters w and b, and some loss function. It can be defined in PyTorch in the following manner:

在训练神经网络时,最常用的算法是 反向传播。在该算法中,参数(模型权重)根据损失函数相对于给定参数的梯度进行调整。为了计算这些梯度,PyTorch 有一个名为 的内置微分引擎torch.autograd。它支持任何计算图的梯度自动计算。

考虑最简单的一层神经网络,具有输入x、参数wb,以及一些损失函数。它可以通过以下方式在 PyTorch 中定义:

import torch

x = torch.ones(5)  # input tensor
y = torch.zeros(3)  # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

A function that we apply to tensors to construct computational graph is in fact an object of class Function. This object knows how to compute the function in the forward direction, and also how to compute its derivative during the backward propagation step. A reference to the backward propagation function is stored in grad_fn property of a tensor. You can find more information of Function in the documentation.

print('Gradient function for z =',z.grad_fn)
print('Gradient function for loss =', loss.grad_fn)

Out:

Gradient function for z = 
Gradient function for loss = 

Computing Gradients

To optimize weights of parameters in the neural network, we need to compute the derivatives of our loss function with respect to parameters, namely, we need and under some fixed values of x and y. To compute those derivatives, we call loss.backward(), and then retrieve the values from w.grad and b.grad:

loss.backward()
print(w.grad)
print(b.grad)

Out:

tensor([[0.0540, 0.3248, 0.3158],
        [0.0540, 0.3248, 0.3158],
        [0.0540, 0.3248, 0.3158],
        [0.0540, 0.3248, 0.3158],
        [0.0540, 0.3248, 0.3158]])
tensor([0.0540, 0.3248, 0.3158])

Disabling Gradient Tracking

By default, all tensors with requires_grad=True are tracking their computational history and support gradient computation. However, there are some cases when we do not need to do that, for example, when we have trained the model and just want to apply it to some input data, i.e. we only want to do forward computations through the network. We can stop tracking computations by surrounding our computation code with torch.no_grad() block:

默认情况下,所有具有 的张量requires_grad=True都在跟踪它们的计算历史并支持梯度计算。但是,有些情况下我们不需要这样做,例如,当我们训练了模型并且只想将其应用于某些输入数据时,即我们只想通过网络进行前向计算。我们可以通过用torch.no_grad()块包围我们的计算代码来停止跟踪计算 :

z = torch.matmul(x, w)+b
print(z.requires_grad)

with torch.no_grad():
    z = torch.matmul(x, w)+b
print(z.requires_grad)

Out:

True
False

Another way to achieve the same result is to use the detach() method on the tensor:

实现相同结果的另一种detach()方法是在张量上使用该方法:

z = torch.matmul(x, w)+b
z_det = z.detach()
print(z_det.requires_grad)

Out:

False
  • There are reasons you might want to disable gradient tracking:

    To mark some parameters in your neural network at frozen parameters. This is a very common scenario for finetuning a pretrained networkTo speed up computations when you are only doing forward pass, because computations on tensors that do not track gradients would be more efficient.

您可能想要禁用梯度跟踪的原因有:

  • 将神经网络中的某些参数标记为冻结参数。这是微调预训练网络的一个非常常见的场景
  • 在仅进行前向传递时加快计算速度,因为对不跟踪梯度的张量进行计算会更有效。

6. OPTIMIZING MODEL PARAMETERS

Now that we have a model and data it’s time to train, validate and test our model by optimizing its parameters on our data. Training a model is an iterative process; in each iteration (called an epoch) the model makes a guess about the output, calculates the error in its guess (loss), collects the derivatives of the error with respect to its parameters (as we saw in the previous section), and optimizes these parameters using gradient descent. For a more detailed walkthrough of this process, check out this video on backpropagation from 3Blue1Brown.

Prerequisite Code

We load the code from the previous sections on Datasets & DataLoaders and Build Model.

import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda

training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

train_dataloader = DataLoader(training_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.linear_relu_stack = nn.Sequential(
            nn.Flatten(),
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
            nn.ReLU()
        )

    def forward(self, x):
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork()

Hyperparameters

Hyperparameters are adjustable parameters that let you control the model optimization process. Different hyperparameter values can impact model training and convergence rates (read more about hyperparameter tuning)

We define the following hyperparameters for training:

  • Number of Epochs - the number times to iterate over the dataset

  • Batch Size - the number of data samples propagated through the network before the parameters are updated

  • Learning Rate - how much to update models parameters at each batch/epoch. Smaller values yield slow learning speed, while large values may result in unpredictable behavior during training.

超参数是可调节的参数,可让您控制模型优化过程。不同的超参数值会影响模型训练和收敛速度(阅读有关超参数调整的更多信息)

我们为训练定义了以下超参数:

Number of Epochs - 迭代数据集的次数

Batch Size - 在更新参数之前通过网络传播的数据样本数量

学习率- 在每个批次/时期更新模型参数的程度。较小的值会导致学习速度变慢,而较大的值可能会导致训练过程中出现不可预测的行为。

learning_rate = 1e-3
batch_size = 64
epochs = 5

Optimization Loop

Once we set our hyperparameters, we can then train and optimize our model with an optimization loop. Each iteration of the optimization loop is called an epoch.

Each epoch consists of two main parts:

  • The Train Loop - iterate over the training dataset and try to converge to optimal parameters.

  • The Validation/Test Loop - iterate over the test dataset to check if model performance is improving.

Let’s briefly familiarize ourselves with some of the concepts used in the training loop. Jump ahead to see the Full Implementation of the optimization loop.一旦我们设置了我们的超参数,我们就可以使用优化循环来训练和优化我们的模型。优化循环的每次迭代称为一个epoch

每个时代由两个主要部分组成:

  • 训练循环- 迭代训练数据集并尝试收敛到最佳参数。

  • 验证/测试循环- 迭代测试数据集以检查模型性能是否正在提高。

让我们简要地熟悉一下训练循环中使用的一些概念。跳转到查看优化循环的完整实现。

Loss Function

When presented with some training data, our untrained network is likely not to give the correct answer. Loss function measures the degree of dissimilarity of obtained result to the target value, and it is the loss function that we want to minimize during training. To calculate the loss we make a prediction using the inputs of our given data sample and compare it against the true data label value.

Common loss functions include nn.MSELoss (Mean Square Error) for regression tasks, and nn.NLLLoss (Negative Log Likelihood) for classification. nn.CrossEntropyLoss combines nn.LogSoftmax and nn.NLLLoss.

We pass our model’s output logits to nn.CrossEntropyLoss, which will normalize the logits and compute the prediction error.

当提供一些训练数据时,我们未经训练的网络可能不会给出正确的答案。损失函数衡量得到的结果与目标值的不相似程度,是我们在训练过程中想要最小化的损失函数。为了计算损失,我们使用给定数据样本的输入进行预测,并将其与真实数据标签值进行比较。

常见的损失函数包括用于回归任务的nn.MSELoss(均方误差)和 用于分类的nn.NLLLoss(负对数似然)。 nn.CrossEntropyLoss结合nn.LogSoftmaxnn.NLLLoss

我们将模型的输出 logits 传递给nn.CrossEntropyLoss,这将标准化 logits 并计算预测误差。

# Initialize the loss function
loss_fn = nn.CrossEntropyLoss()

Optimizer

Optimization is the process of adjusting model parameters to reduce model error in each training step. Optimization algorithms define how this process is performed (in this example we use Stochastic Gradient Descent). All optimization logic is encapsulated in the optimizer object. Here, we use the SGD optimizer; additionally, there are many different optimizers available in PyTorch such as ADAM and RMSProp, that work better for different kinds of models and data.

We initialize the optimizer by registering the model’s parameters that need to be trained, and passing in the learning rate hyperparameter.

优化是在每个训练步骤中调整模型参数以减少模型误差的过程。优化算法定义了这个过程是如何执行的(在这个例子中我们使用随机梯度下降)。所有优化逻辑都封装在optimizer对象中。在这里,我们使用 SGD 优化器;此外, PyTorch 中有许多不同的优化器可用,例如 ADAM 和 RMSProp,它们更适用于不同类型的模型和数据。

我们通过注册模型需要训练的参数并传入学习率超参数来初始化优化器。

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

Inside the training loop, optimization happens in three steps:

  • Call optimizer.zero_grad() to reset the gradients of model parameters. Gradients by default add up; to prevent double-counting, we explicitly zero them at each iteration.
  • Backpropagate the prediction loss with a call to loss.backwards(). PyTorch deposits the gradients of the loss w.r.t. each parameter.
  • Once we have our gradients, we call optimizer.step() to adjust the parameters by the gradients collected in the backward pass.

在训练循环中,优化分三步进行:

  • 调用optimizer.zero_grad()以重置模型参数的梯度。默认情况下渐变相加;为了防止重复计算,我们在每次迭代时明确地将它们归零。
  • 调用 来反向传播预测损失loss.backwards()。PyTorch 存储了每个参数的损失梯度。
  • 一旦我们有了梯度,我们就调用optimizer.step()在向后传递中收集的梯度来调整参数。

Full Implementation

We define train_loop that loops over our optimization code, and test_loop that evaluates the model’s performance against our test data.

def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    for batch, (X, y) in enumerate(dataloader):
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        # 调用optimizer.zero_grad()以重置模型参数的梯度
        optimizer.zero_grad()
        # 调用 =loss.backwards()来反向传播预测损失
        loss.backward()
        # 一旦我们有了梯度,我们就调用optimizer.step()在向后传递中收集的梯度来调整参数
        optimizer.step()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")


def test_loop(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

We initialize the loss function and optimizer, and pass it to train_loop and test_loop. Feel free to increase the number of epochs to track the model’s improving performance.

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

epochs = 10
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

7. SAVE AND LOAD THE MODEL

In this section we will look at how to persist model state with saving, loading and running model predictions.

import torch
import torch.onnx as onnx
import torchvision.models as models

Saving and Loading Model Weights

PyTorch models store the learned parameters in an internal state dictionary, called state_dict. These can be persisted via the torch.save method:

PyTorch 模型将学习到的参数存储在一个名为 的内部状态字典中state_dict。这些可以通过以下torch.save 方法持久化:

model = models.vgg16(pretrained=True)
torch.save(model.state_dict(), 'model_weights.pth')

To load model weights, you need to create an instance of the same model first, and then load the parameters using load_state_dict() method.

要加载模型权重,您需要先创建相同模型的实例,然后使用load_state_dict()方法加载参数。

model = models.vgg16() # we do not specify pretrained=True, i.e. do not load default weights
model.load_state_dict(torch.load('model_weights.pth'))
model.eval()

be sure to call model.eval() method before inferencing to set the dropout and batch normalization layers to evaluation mode. Failing to do this will yield inconsistent inference results.

一定要调用model.eval()在推理之前,以将 dropout 和批量归一化层设置为评估模式。不这样做会产生不一致的推理结果。

Saving and Loading Models with Shapes

When loading model weights, we needed to instantiate the model class first, because the class defines the structure of a network. We might want to save the structure of this class together with the model, in which case we can pass model (and not model.state_dict()) to the saving function:

在加载模型权重时,我们需要先实例化模型类,因为该类定义了网络的结构。我们可能希望将此类的结构与模型一起保存,在这种情况下,我们可以将model(而不是model.state_dict())传递给保存函数:

torch.save(model, 'model.pth')

We can then load the model like this:

model = torch.load('model.pth')

This approach uses Python pickle module when serializing the model, thus it relies on the actual class definition to be available when loading the model.

这种方法在序列化模型时使用 Python pickle模块,因此它依赖于加载模型时可用的实际类定义。

Exporting Model to ONNX

PyTorch also has native ONNX export support. Given the dynamic nature of the PyTorch execution graph, however, the export process must traverse the execution graph to produce a persisted ONNX model. For this reason, a test variable of the appropriate size should be passed in to the export routine (in our case, we will create a dummy zero tensor of the correct size):

PyTorch 还具有本机 ONNX 导出支持。然而,鉴于 PyTorch 执行图的动态特性,导出过程必须遍历执行图以生成持久化的 ONNX 模型。出于这个原因,应该将适当大小的测试变量传递给导出例程(在我们的例子中,我们将创建一个正确大小的虚拟零张量):

input_image = torch.zeros((1,3,224,224))
onnx.export(model, input_image, 'model.onnx')

There are a lot of things you can do with ONNX model, including running inference on different platforms and in different programming languages. For more details, we recommend visiting ONNX tutorial.

Congratulations! You have completed the PyTorch beginner tutorial! Try revisting the first page to see the tutorial in its entirety again. We hope this tutorial has helped you get started with deep learning on PyTorch. Good luck!

您可以使用 ONNX 模型做很多事情,包括在不同平台和不同编程语言上运行推理。有关更多详细信息,我们建议访问ONNX 教程。

恭喜!您已完成 PyTorch 初学者教程!尝试 重新查看第一页以再次查看整个教程。我们希望本教程能帮助您开始在 PyTorch 上进行深度学习。祝你好运!

你可能感兴趣的:(Pytorch Tutorial)