现在我们有了模型和数据,是时候通过优化数据参数来训练、验证和测试我们的模型了。 训练模型是一个迭代过程; 在每次迭代(称为 epoch)中,模型对输出进行猜测,计算其猜测中的误差(损失),收集误差对其参数的导数(如我们在上一节中看到的),并优化 这些参数使用梯度下降。
我们从前面关于 Datasets & DataLoaders 和 Build Model 的部分加载代码。
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda
training_data = datasets.FashionMNIST(
root="data",
train=True,
download=True,
transform=ToTensor()
)
test_data = datasets.FashionMNIST(
root="data",
train=False,
download=True,
transform=ToTensor()
)
train_dataloader = DataLoader(training_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 10),
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork()
输出:
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz
26422272/? [00:02<00:00, 17874373.04it/s]
Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz
29696/? [00:00<00:00, 99991.92it/s]
Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz
4422656/? [00:01<00:00, 5223475.20it/s]
Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz
6144/? [00:00<00:00, 85236.21it/s]
Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw
/usr/local/lib/python3.7/dist-packages/torchvision/datasets/mnist.py:498: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:180.)
return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)
超参数是可调节的参数,可让您控制模型优化过程。 不同的超参数值会影响模型训练和收敛速度。
我们为训练定义了以下超参数:
learning_rate = 1e-3
batch_size = 64
epochs = 5
一旦我们设置了我们的超参数,我们就可以使用优化循环来训练和优化我们的模型。 优化循环的每次迭代称为一个时期。
每个 epoch 由两个主要部分组成:
让我们简要地熟悉一下训练循环中使用的一些概念。 跳转到查看优化循环的完整实现。
当提供一些训练数据时,我们未经训练的网络可能不会给出正确的答案。 损失函数衡量得到的结果与目标值的不相似程度,是我们在训练过程中想要最小化的损失函数。 为了计算损失,我们使用给定数据样本的输入进行预测,并将其与真实数据标签值进行比较。
常见的损失函数包括用于回归任务的 nn.MSELoss(均方误差)和用于分类的 nn.NLLLoss(负对数似然)。 nn.CrossEntropyLoss 结合了 nn.LogSoftmax 和 nn.NLLLoss。
我们将模型的输出 logits 传递给 nn.CrossEntropyLoss,它将对 logits 进行归一化并计算预测误差。
# Initialize the loss function loss_fn = nn.CrossEntropyLoss()
优化是在每个训练步骤中调整模型参数以减少模型误差的过程。 优化算法定义了这个过程是如何执行的(在这个例子中我们使用随机梯度下降)。 所有优化逻辑都封装在优化器对象中。 在这里,我们使用 SGD 优化器; 此外,PyTorch 中有许多不同的优化器可用,例如 ADAM 和 RMSProp,它们更适用于不同类型的模型和数据。
我们通过注册模型需要训练的参数并传入学习率超参数来初始化优化器。
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
在训练循环中,优化分三步进行:
我们定义了循环优化代码的 train_loop 和根据我们的测试数据评估模型性能的 test_loop。
def train_loop(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
for batch, (X, y) in enumerate(dataloader):
# Compute prediction and loss
pred = model(X)
loss = loss_fn(pred, y)
# Backpropagation
optimizer.zero_grad()
loss.backward()
optimizer.step()
if batch % 100 == 0:
loss, current = loss.item(), batch * len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")
def test_loop(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
我们初始化损失函数和优化器,并将其传递给 train_loop 和 test_loop。 随意增加 epochs 的数量以跟踪模型的改进性能。
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
epochs = 10
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train_loop(train_dataloader, model, loss_fn, optimizer)
test_loop(test_dataloader, model, loss_fn)
print("Done!")
Epoch 1
-------------------------------
loss: 2.313743 [ 0/60000]
loss: 2.297442 [ 6400/60000]
loss: 2.285412 [12800/60000]
loss: 2.277439 [19200/60000]
loss: 2.247789 [25600/60000]
loss: 2.234551 [32000/60000]
loss: 2.240703 [38400/60000]
loss: 2.210060 [44800/60000]
loss: 2.218151 [51200/60000]
loss: 2.187313 [57600/60000]
Test Error:
Accuracy: 42.7%, Avg loss: 2.176843
Epoch 2
-------------------------------
loss: 2.189964 [ 0/60000]
loss: 2.177657 [ 6400/60000]
loss: 2.133112 [12800/60000]
loss: 2.150209 [19200/60000]
loss: 2.085274 [25600/60000]
loss: 2.051816 [32000/60000]
loss: 2.069125 [38400/60000]
loss: 2.000842 [44800/60000]
loss: 2.016865 [51200/60000]
loss: 1.947469 [57600/60000]
Test Error:
Accuracy: 55.7%, Avg loss: 1.939849
Epoch 3
-------------------------------
loss: 1.971378 [ 0/60000]
loss: 1.940551 [ 6400/60000]
loss: 1.840011 [12800/60000]
loss: 1.874466 [19200/60000]
loss: 1.756598 [25600/60000]
loss: 1.721903 [32000/60000]
loss: 1.728700 [38400/60000]
loss: 1.633625 [44800/60000]
loss: 1.654384 [51200/60000]
loss: 1.550143 [57600/60000]
Test Error:
Accuracy: 60.4%, Avg loss: 1.564256
Epoch 4
-------------------------------
loss: 1.626813 [ 0/60000]
loss: 1.590268 [ 6400/60000]
loss: 1.445933 [12800/60000]
loss: 1.507461 [19200/60000]
loss: 1.386321 [25600/60000]
loss: 1.387192 [32000/60000]
loss: 1.390291 [38400/60000]
loss: 1.312439 [44800/60000]
loss: 1.342614 [51200/60000]
loss: 1.250942 [57600/60000]
Test Error:
Accuracy: 63.1%, Avg loss: 1.270950
Epoch 5
-------------------------------
loss: 1.345123 [ 0/60000]
loss: 1.327268 [ 6400/60000]
loss: 1.163304 [12800/60000]
loss: 1.261321 [19200/60000]
loss: 1.140278 [25600/60000]
loss: 1.168023 [32000/60000]
loss: 1.183694 [38400/60000]
loss: 1.114995 [44800/60000]
loss: 1.152414 [51200/60000]
loss: 1.080323 [57600/60000]
Test Error:
Accuracy: 64.3%, Avg loss: 1.093396
Epoch 6
-------------------------------
loss: 1.161143 [ 0/60000]
loss: 1.165154 [ 6400/60000]
loss: 0.984008 [12800/60000]
loss: 1.113623 [19200/60000]
loss: 0.993426 [25600/60000]
loss: 1.026297 [32000/60000]
loss: 1.058669 [38400/60000]
loss: 0.993498 [44800/60000]
loss: 1.032495 [51200/60000]
loss: 0.976228 [57600/60000]
Test Error:
Accuracy: 65.8%, Avg loss: 0.982270
Epoch 7
-------------------------------
loss: 1.036877 [ 0/60000]
loss: 1.063356 [ 6400/60000]
loss: 0.865277 [12800/60000]
loss: 1.018959 [19200/60000]
loss: 0.903607 [25600/60000]
loss: 0.929657 [32000/60000]
loss: 0.978245 [38400/60000]
loss: 0.916611 [44800/60000]
loss: 0.951660 [51200/60000]
loss: 0.907701 [57600/60000]
Test Error:
Accuracy: 67.2%, Avg loss: 0.908061
Epoch 8
-------------------------------
loss: 0.947494 [ 0/60000]
loss: 0.994011 [ 6400/60000]
loss: 0.781766 [12800/60000]
loss: 0.953691 [19200/60000]
loss: 0.844838 [25600/60000]
loss: 0.860156 [32000/60000]
loss: 0.922013 [38400/60000]
loss: 0.865991 [44800/60000]
loss: 0.894223 [51200/60000]
loss: 0.858752 [57600/60000]
Test Error:
Accuracy: 68.5%, Avg loss: 0.855267
Epoch 9
-------------------------------
loss: 0.879686 [ 0/60000]
loss: 0.942385 [ 6400/60000]
loss: 0.719843 [12800/60000]
loss: 0.905830 [19200/60000]
loss: 0.803321 [25600/60000]
loss: 0.808449 [32000/60000]
loss: 0.879683 [38400/60000]
loss: 0.830828 [44800/60000]
loss: 0.851620 [51200/60000]
loss: 0.821371 [57600/60000]
Test Error:
Accuracy: 69.7%, Avg loss: 0.815541
Epoch 10
-------------------------------
loss: 0.826200 [ 0/60000]
loss: 0.900994 [ 6400/60000]
loss: 0.671791 [12800/60000]
loss: 0.869145 [19200/60000]
loss: 0.771820 [25600/60000]
loss: 0.768692 [32000/60000]
loss: 0.845700 [38400/60000]
loss: 0.805013 [44800/60000]
loss: 0.818642 [51200/60000]
loss: 0.791360 [57600/60000]
Test Error:
Accuracy: 71.1%, Avg loss: 0.784093
Done!