提供了两种数据相关的类,DataLoader与Dataset,前者负责迭代读取数据集中的所有数据,一批一批的提供数据,后者负责数据集单个读取如图片与标签的读取。
Pytorch提供了针对各个领域的 TorchText, TorchVision, and TorchAudio,每个有各自的dataset。
定义一个模型需要继承自nn.Module,在__init__方法中进行初始化,定义模型的层,然后再forward中进行前向操作。为了加速可以移动到GPU上面。
# Get cpu or gpu device for training.
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using {} device".format(device))
# Define model
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 10),
nn.ReLU()
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork().to(device)
print(model)
为了训练一个模型,需要损失函数和优化器。
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
在一次前向中,模型在训练集上面进行一次预测,然后反向传播误差到模型参数中。
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
for batch, (X, y) in enumerate(dataloader):
X, y = X.to(device), y.to(device)
# Compute prediction error
pred = model(X)
loss = loss_fn(pred, y)
# Backpropagation
optimizer.zero_grad()
loss.backward()
optimizer.step()
if batch % 100 == 0:
loss, current = loss.item(), batch * len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")
可以通过测试来判断模型是否进行了学习。
def test(dataloader, model):
size = len(dataloader.dataset)
model.eval()
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
X, y = X.to(device), y.to(device)
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= size
correct /= size
print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
训练过程可以多个eopch,可以打印损失和模型的正确率。
epochs = 5
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(train_dataloader, model, loss_fn, optimizer)
test(test_dataloader, model)
print("Done!")
一个通常的保存模型的方法是序列化内部的状态字典(包含模型的参数)
torch.save(model.state_dict(), "model.pth")
print("Saved PyTorch Model State to model.pth")
为了重新加载模型,可以加载状态字典。
model = NeuralNetwork()
model.load_state_dict(torch.load("model.pth"))
如此便可以进行模型的推理了
classes = [
"T-shirt/top",
"Trouser",
"Pullover",
"Dress",
"Coat",
"Sandal",
"Shirt",
"Sneaker",
"Bag",
"Ankle boot",
]
model.eval()
x, y = test_data[0][0], test_data[0][1]
with torch.no_grad():
pred = model(x)
predicted, actual = classes[pred[0].argmax(0)], classes[y]
print(f'Predicted: "{predicted}", Actual: "{actual}"')
Tensor与numpy很相似,有以下几种创建方法:
data = [[1, 2],[3, 4]]
x_data = torch.tensor(data)
np_array = np.array(data)
x_np = torch.from_numpy(np_array)
#新的tensor保留参数的属性(形状、数据类型)。除非指定覆盖
x_ones = torch.ones_like(x_data) # retains the properties of x_data
print(f"Ones Tensor: \n {x_ones} \n")
x_rand = torch.rand_like(x_data, dtype=torch.float) # overrides the datatype of x_data
print(f"Random Tensor: \n {x_rand} \n")
shape = (2,3,)
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)
print(f"Random Tensor: \n {rand_tensor} \n")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Zeros Tensor: \n {zeros_tensor}")
默认创建在CPU,需要手动移动到GPU,使用tensor.to(“cuda”)移动。
记住,跨设备拷贝大量数据可能非常耗时和耗费内存
agg = tensor.sum()
agg_item = agg.item()
print(agg_item, type(agg_item)) # 12.0
将数据读取从训练过程解耦,Dataset stores the samples and their corresponding labels, and DataLoader wraps an iterable around the Dataset to enable easy access to the samples。
自定义的dataset必须实现三个函数:init, len, and getitem.。如下:
import os
import pandas as pd
from torchvision.io import read_image
class CustomImageDataset(Dataset):
def __init__(self, annotations_file, img_dir, transform=None, target_transform=None):
self.img_labels = pd.read_csv(annotations_file)
self.img_dir = img_dir
self.transform = transform
self.target_transform = target_transform
def __len__(self):
return len(self.img_labels)
def __getitem__(self, idx):
img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0])
image = read_image(img_path)
label = self.img_labels.iloc[idx, 1]
if self.transform:
image = self.transform(image)
if self.target_transform:
label = self.target_transform(label)
sample = {"image": image, "label": label}
return sample
返回tensor图像数据与对应的字典
Dataset每次读取一次样本,当需要batch的时候,打乱数据顺序与使用多进程的时候可以使用Dataloader来加速数据读取。
from torch.utils.data import DataLoader
train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=64, shuffle=True)
train_features, train_labels = next(iter(train_dataloader))
input_image = torch.rand(3,28,28)
print(input_image.size())
flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())
layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size())
print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")
seq_modules = nn.Sequential(
flatten,
layer1,
nn.ReLU(),
nn.Linear(20, 10)
)
input_image = torch.rand(3,28,28)
logits = seq_modules(input_image)
许多神经网络的层是参数化的,在训练过程中权重和偏置是优化的,继承nn.Module会自动追踪所有的模型定义的域,使用模型的parameters()或者named_parameters()方法来访问所有的参数。
可以迭代访问所有参数:
print("Model structure: ", model, "\n\n")
for name, param in model.named_parameters():
print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")
训练神经网络的时候,最常用的算法就是反向传播,模型参数根据损失函数的梯度来调整相关的参数。
为了计算梯度,Pytorch有内置的求导引擎叫做torch.autograd,它支持自动计算任何计算图的梯度。
比如如下代码的计算图:
import torch
x = torch.ones(5) # input tensor
y = torch.zeros(3) # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)
其中,w和b是参数,可以优化,为了计算梯度,需要将之requires_grad属性设置为True。
可以在创建的时候设置,也可以以后使用x.requires_grad_(True)
我们应用到tensor的函数会构建计算图,它知道如何在forward的时候计算函数和倒数。反向传播函数被存储在tensor的grad_fn属性中。
print('Gradient function for z =',z.grad_fn)
print('Gradient function for loss =', loss.grad_fn)
#Gradient function for z =
#Gradient function for loss =
为了计算导数,我们调用loss.backward()然后可以取得值在w.grad and b.grad。
loss.backward()
print(w.grad)
print(b.grad)
注意
默认的,所有的设置requires_grad=True的tensors会追踪计算历史和支持梯度计算。但是,在推理的时候不需要追踪,只需要前向计算。通过包裹 torch.no_grad()块可以停止追踪。
z = torch.matmul(x, w)+b
print(z.requires_grad) # True
with torch.no_grad():
z = torch.matmul(x, w)+b
print(z.requires_grad) #False
另外一种方法就是使用tensor的detach()方法:
z = torch.matmul(x, w)+b
z_det = z.detach()
print(z_det.requires_grad) #False
取消梯度跟踪的目的为下:
通常,autograd保存了数据(tensor)的记录和所有的执行操作(同生成的新tensor)在一个直接的由函数对象组成的有向无环图directed acylic graph(DAG)中,在DAG中,叶子节点为输入tensors,roots是输出tensors,通过从图的根部到叶子,你可以使用链式法则自动计算梯度。
注意
DAGs在pytorch是动态的,一个中啊哟的事情是计算图是从碎片重建的,在每个.backward()调用后,autograd开始填充新的图,这可以允许你在模型中使用控制流表示,你可以在每次迭代中修改shape,size和operations。
inp = torch.eye(5, requires_grad=True)
out = (inp+1).pow(2)
out.backward(torch.ones_like(inp), retain_graph=True)
print("First call\n", inp.grad)
out.backward(torch.ones_like(inp), retain_graph=True)
print("\nSecond call\n", inp.grad)
inp.grad.zero_()
out.backward(torch.ones_like(inp), retain_graph=True)
print("\nCall after zeroing gradients\n", inp.grad)
# out
First call
tensor([[4., 2., 2., 2., 2.],
[2., 4., 2., 2., 2.],
[2., 2., 4., 2., 2.],
[2., 2., 2., 4., 2.],
[2., 2., 2., 2., 4.]])
Second call
tensor([[8., 4., 4., 4., 4.],
[4., 8., 4., 4., 4.],
[4., 4., 8., 4., 4.],
[4., 4., 4., 8., 4.],
[4., 4., 4., 4., 8.]])
Call after zeroing gradients
tensor([[4., 2., 2., 2., 2.],
[2., 4., 2., 2., 2.],
[2., 2., 4., 2., 2.],
[2., 2., 2., 4., 2.],
[2., 2., 2., 2., 4.]])
如上,两次调用的时候梯度是累加的。
可以使用torch.save保存state_dict,加载同理:
model = models.vgg16(pretrained=True)
torch.save(model.state_dict(), 'model_weights.pth')
model = models.vgg16() # we do not specify pretrained=True, i.e. do not load default weights
model.load_state_dict(torch.load('model_weights.pth'))
model.eval()
注意
为了确保调用**model.eval()**在推理前,来使得dropout和batch normalization层在评估模式,否则可能使得结果不一致。
当加载模型权重时,我们需要实例化模型类,因此类决定了网络的结构,我们可能想要将类的结构和模型一起保存,这样我们可以将model放入保存中。
torch.save(model, 'model.pth')
model = torch.load('model.pth')
注意:
会使用python的pickle模块序列化模型,因此依赖于真实的类定义当加载模型的时候。
Pytorch也有本地ONNX导出支持,给定动态自然的Pytorch计算图。但是这个到处过程必须遍历计算图来产生持久化的ONNX模型。因此,一个测试变量需要被传递到导出路径.。
input_image = torch.zeros((1,3,224,224))
onnx.export(model, input_image, 'model.onnx')
到处ONNX模型可以做很多事,比如在不同平台进行推理切使用不同的编程语言。
训练一个神经网络需要两个步骤:
在Pytorch中的流程如下:
import torch, torchvision
model = torchvision.models.resnet18(pretrained=True)
data = torch.rand(1, 3, 64, 64)
labels = torch.rand(1, 1000)
# 进行前向推理
prediction = model(data) # forward pass
# 计算损失以后进行反向传播,在error 的tensor上调用.backward(),Autograd会计算和保存每个模型参数的梯度在参数的.grad属性中。
loss = (prediction - labels).sum()
loss.backward() # backward pass
# 加载一个optimizer,SGD会优化参数。
optim = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)
# 最后,调用.step()来初始化梯度下降,这个优化其调整每个参数通过存储在.grad的梯度。
optim.step() #gradient descent
可以自己验证自己求导的值与.grad中的值是一致的。
当你有一个向量函数y = f(x),这个向量y 关于向量x的梯度是一个雅可比矩阵J
通俗来讲,torch.autograd是计算向量和雅可比矩阵乘法的引擎,给定一个向量v,计算出J.T*v,如果v恰好是标量函数I=g(y)的梯度,这样通过链式法则,向量v和雅可比的乘法值就是I相对于x的梯度。
这个特性就是求导的方法。
在pytorch中我们可以轻易的定义我们自己的自动求导操作通过定义torch.autograd.Function的子类和实现forwar和backward函数。
# -*- coding: utf-8 -*-
import torch
import math
class LegendrePolynomial3(torch.autograd.Function):
"""
We can implement our own custom autograd Functions by subclassing
torch.autograd.Function and implementing the forward and backward passes
which operate on Tensors.
"""
@staticmethod
def forward(ctx, input):
"""
In the forward pass we receive a Tensor containing the input and return
a Tensor containing the output. ctx is a context object that can be used
to stash information for backward computation. You can cache arbitrary
objects for use in the backward pass using the ctx.save_for_backward method.
"""
ctx.save_for_backward(input)
return 0.5 * (5 * input ** 3 - 3 * input)
@staticmethod
def backward(ctx, grad_output):
"""
In the backward pass we receive a Tensor containing the gradient of the loss
with respect to the output, and we need to compute the gradient of the loss
with respect to the input.
"""
input, = ctx.saved_tensors
return grad_output * 1.5 * (5 * input ** 2 - 1)
dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU
# Create Tensors to hold input and outputs.
# By default, requires_grad=False, which indicates that we do not need to
# compute gradients with respect to these Tensors during the backward pass.
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)
# Create random Tensors for weights. For this example, we need
# 4 weights: y = a + b * P3(c + d * x), these weights need to be initialized
# not too far from the correct result to ensure convergence.
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Tensors during the backward pass.
a = torch.full((), 0.0, device=device, dtype=dtype, requires_grad=True)
b = torch.full((), -1.0, device=device, dtype=dtype, requires_grad=True)
c = torch.full((), 0.0, device=device, dtype=dtype, requires_grad=True)
d = torch.full((), 0.3, device=device, dtype=dtype, requires_grad=True)
learning_rate = 5e-6
for t in range(2000):
# To apply our Function, we use Function.apply method. We alias this as 'P3'.
P3 = LegendrePolynomial3.apply
# Forward pass: compute predicted y using operations; we compute
# P3 using our custom autograd operation.
y_pred = a + b * P3(c + d * x)
# Compute and print loss
loss = (y_pred - y).pow(2).sum()
if t % 100 == 99:
print(t, loss.item())
# Use autograd to compute the backward pass.
loss.backward()
# Update weights using gradient descent
with torch.no_grad():
a -= learning_rate * a.grad
b -= learning_rate * b.grad
c -= learning_rate * c.grad
d -= learning_rate * d.grad
# Manually zero the gradients after updating weights
a.grad = None
b.grad = None
c.grad = None
d.grad = None
print(f'Result: y = {a.item()} + {b.item()} * P3({c.item()} + {d.item()} x)')
计算图和autograd在定义复杂操作和自动求导时非常强大,但是对于大型的神经网络来说原始的autograd操作太low-level了。nn包定义了一组modules,方便使用。