为方便了解 pytorch 训练和评估模型的完整流程,本文直接用一个 FashionMNIST 图像分类的实例来写一个模板,供以后参考(绝不是为了偷懒)。
import os
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
# 配置GPU
## 方式一:使用os.environ
# os.environ['CUDA_VISIBLE_DEVICES'] = '0'
## 方式二:使用“device”,之后对需要使用GPU的变量用.to(device)即可
device = torch.device('cuda:1' if torch.cuda.is_available() else 'cpu') #cuda:0 可能是集成显卡,性能较弱
print('Device: ', device)
# 配置其他超参数
batch_size = 256
num_workers = 0 #Windows应设为0,否则会出现多线程错误
lr = 1e-3 #学习率
epochs = 20
Device: cuda:1
数据读取主要有以下两种方式:
第一种方式比较方便快捷,常用与快速测试,但实际工作中的数据往往与 pytorch 提供的数据不同,这就需要第二种方式,自己构建 Dataset。
同时,还需要对数据进行必要的变换,比如说需要将图片统一为一致的大小,以便后续能够输入网络训练;需要将数据格式转为 Tensor 类,等等。
这些变换可以很方便地借助 torchvision 包来完成,这是 PyTorch 官方用于图像处理的工具库,上面提到的使用内置数据集的方式也要用到。
# 首先设置数据变换
from torchvision import transforms
img_size = 28
data_transform = transforms.Compose([
transforms.ToPILImage(), #如果使用torchvision内置数据集则不需要这一步骤
transforms.Resize(img_size),
transforms.ToTensor()
])
# 读取方式一:使用torchvision 自带数据集,下载可能需要一段时间
from torchvision import datasets
train_data = datasets.FashionMNIST(root='./', train=True, download=True, transform=data_transform)
test_data = datasets.FashionMNIST(root='./', train=False, download=True, transform=data_transform)
print('Data is downloaded.')
## 读取方式二:读入csv格式的数据,自行构建Dataset类
# csv数据下载链接:https://www.kaggle.com/zalando-research/fashionmnist
class FMDataset(Dataset):
def __init__(self, df, transform=None):
self.df = df
self.transform = transform
self.images = df.iloc[:,1:].values.astype(np.uint8)
self.labels = df.iloc[:, 0].values
def __len__(self):
return len(self.images)
def __getitem__(self, idx):
image = self.images[idx].reshape(28,28,1)
label = int(self.labels[idx])
if self.transform is not None:
image = self.transform(image)
else:
image = torch.tensor(image/255., dtype=torch.float)
label = torch.tensor(label, dtype=torch.long)
return image, label
train_df = pd.read_csv("./FashionMNIST/fashion-mnist_train.csv")
test_df = pd.read_csv("./FashionMNIST/fashion-mnist_test.csv")
train_data = FMDataset(train_df, data_transform)
test_data = FMDataset(test_df, data_transform)
在构建训练和测试数据集完成后,需要定义 DataLoader 类,以便在训练和测试时加载数据.
参数说明:
train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=False, num_workers=num_workers, drop_last=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False, num_workers=num_workers)
定义好 DataLoader 类后,我们可以对数据进行可视化,验证我们读入的数据是否正确。
PyTorch 中的 DataLoader 的读取可以使用 next 和 iter 来完成。
import matplotlib.pyplot as plt
images, labels = next(iter(train_loader))
print(images.shape, labels.shape)
plt.imshow(images[0][0])
torch.Size([256, 1, 28, 28]) torch.Size([256])
set(train_df['label']) #查看标签个数,作为output_dim
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
这个任务比较简单,手动搭建一个 CNN,然后放到 GPU 上去训练。
class NET(nn.Module):
def __init__(self):
super(NET, self).__init__()
self.conv = nn.Sequential(
nn.Conv2d(1, 32, 5), #Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0)
nn.ReLU(), #激活函数
nn.MaxPool2d(2, stride=2), #最大池化层
nn.Dropout(0.3), #按一定概率舍弃一些神经元,防止过拟合
nn.Conv2d(32, 64, 5),
nn.ReLU(),
nn.MaxPool2d(2, stride=2),
nn.Dropout(0.3)
)
self.fc = nn.Sequential( #全连接层
nn.Linear(64*4*4, 512),
nn.ReLU(),
nn.Linear(512, 10)
)
def forward(self, x):
x = self.conv(x)
x = x.view(-1, 64*4*4)
x = self.fc(x)
return x
model = NET()
model = model.cuda()
# model = nn.DataParallel(model).cuda() # 多卡训练时的写法
由于是分类问题(label 有 10 个,0 到 9),故这里采用采用交叉熵损失函数(CrossEntropy)。
PyTorch 会自动把整数型的 label 转为 one-hot 型,用于计算 CE loss。
这里需要确保 label 是从 0 开始的,同时模型不加 softmax 层(使用 logits 计算),这也说明了 PyTorch 训练中各个部分不是独立的,需要通盘考虑
criterion = nn.CrossEntropyLoss()
# criterion = nn.CrossEntropyLoss(weight=[1,1,1,1,3,1,1,1,1,1])
optimizer = optim.Adam(model.parameters(), lr=lr)
各自封装成函数,方便后续调用。
训练和测试的主要区别:训练需要更新参数,验证和测试不需要(即不需要初始化优化器及跟踪梯度)。
def train(epoch):
model.train()
train_loss = 0
for data, label in train_loader:
data, label = data.cuda(), label.cuda()
optimizer.zero_grad() #梯度清零重置
out = model(data)
loss = criterion(out, label)
loss.backward()
optimizer.step()
train_loss += loss.item()*data.size(0)
train_loss = train_loss / len(train_loader.dataset)
print('Epoch: {} \tTraining Loss: {:.6f}'.format(epoch, train_loss))
def val(epoch):
model.eval()
val_loss = 0
true_labels = []
pred_labels = []
with torch.no_grad():
for data, label in test_loader:
data, label = data.cuda(), label.cuda()
out = model(data)
pred = torch.argmax(out, 1)
true_labels.append(label.cpu().data.numpy())
pred_labels.append(pred.cpu().data.numpy())
loss = criterion(out, label)
val_loss += loss.item() * data.size(0)
val_loss = val_loss / len(test_loader.dataset)
true_labels, pred_labels = np.concatenate(true_labels), np.concatenate(pred_labels)
acc = np.sum(true_labels == pred_labels) / len(pred_labels)
print('Epoch: {} \tValidation Loss: {:.6f}, accuracy: {:.6f}'.format(epoch, val_loss, acc))
for epoch in range(1, epochs+1):
train(epoch)
val(epoch)
Epoch: 1 Training Loss: 0.176090
Epoch: 1 Validation Loss: 0.209182, accuracy: 0.924800
Epoch: 2 Training Loss: 0.171277
Epoch: 2 Validation Loss: 0.208145, accuracy: 0.924300
Epoch: 3 Training Loss: 0.166603
Epoch: 3 Validation Loss: 0.212093, accuracy: 0.923100
Epoch: 4 Training Loss: 0.164097
Epoch: 4 Validation Loss: 0.208337, accuracy: 0.924600
Epoch: 5 Training Loss: 0.160057
Epoch: 5 Validation Loss: 0.214827, accuracy: 0.922900
Epoch: 6 Training Loss: 0.156039
Epoch: 6 Validation Loss: 0.210293, accuracy: 0.923700
Epoch: 7 Training Loss: 0.151632
Epoch: 7 Validation Loss: 0.213692, accuracy: 0.923700
Epoch: 8 Training Loss: 0.144629
Epoch: 8 Validation Loss: 0.214765, accuracy: 0.923700
Epoch: 9 Training Loss: 0.144288
Epoch: 9 Validation Loss: 0.216733, accuracy: 0.920900
Epoch: 10 Training Loss: 0.140210
Epoch: 10 Validation Loss: 0.226147, accuracy: 0.920300
Epoch: 11 Training Loss: 0.133892
Epoch: 11 Validation Loss: 0.222879, accuracy: 0.923200
Epoch: 12 Training Loss: 0.132475
Epoch: 12 Validation Loss: 0.213902, accuracy: 0.925100
Epoch: 13 Training Loss: 0.127834
Epoch: 13 Validation Loss: 0.221766, accuracy: 0.924900
Epoch: 14 Training Loss: 0.126341
Epoch: 14 Validation Loss: 0.220434, accuracy: 0.924700
Epoch: 15 Training Loss: 0.124709
Epoch: 15 Validation Loss: 0.222152, accuracy: 0.923700
Epoch: 16 Training Loss: 0.121907
Epoch: 16 Validation Loss: 0.218433, accuracy: 0.926600
Epoch: 17 Training Loss: 0.119473
Epoch: 17 Validation Loss: 0.221797, accuracy: 0.925400
Epoch: 18 Training Loss: 0.121235
Epoch: 18 Validation Loss: 0.228694, accuracy: 0.923800
Epoch: 19 Training Loss: 0.118886
Epoch: 19 Validation Loss: 0.221554, accuracy: 0.925800
Epoch: 20 Training Loss: 0.112594
Epoch: 20 Validation Loss: 0.231573, accuracy: 0.923400
训练完成后,可以使用torch.save
保存模型参数或整个模型。
save_path = './FashionModel.pkl'
torch.save(model, save_path)
以上就是 pytorch 进行数据读取、模型训练与验证的简易模板。
ipynb 文件是 jupyter notebook 的专属文件,融合了 markdown 和代码块,markdown 方便写笔记和公式,代码块方便写代码,两者结合能够极大提升代码阅读体验。如果能将 ipynb 文件转换成 markdown 文件将会非常方便写博客。以下为转换方法:
首先进入对应的 conda 环境,通过pip install nbconvert
命令下载安装好 nbconvert 包,然后 cd 到相应文件夹里,输入以下命令:
jupyter nbconvert --to markdown test.ipynb
和上面一样,输入以下命令:
jupyter nbconvert --to html test.ipynb
方法一:和上面一样的方法需要安装 xelatex,安装包较大,不建议;
方法二:先转换为 markdown 文件,然后在 vscode 里安装markdown pdf
插件,用该插件将 markdown 文件导出为 PDF 文件即可。
参考资料:
> [1]datawhale_深入浅出pytorch
> [2] https://blog.csdn.net/weixin_45092662/article/details/106470862