本文参考《深度学习工程师认证初级教程》中5.3.2节图像分类案例,利用VGGNet和ResNet实现在Cifar10数据集上的图像分类。
Cifar10数据集是一个大的数据集的子集,包含了10类带标签的图片,图片包括:飞机、汽车、鸟、猫、狗、鹿、蛙、马、船、卡车等10类。每幅为32*32像素彩色图片,每类6000幅图片,总共60000幅图片,其中50000幅为训练数据,10000幅为测试数据。图片互相独立,也就是说在一副图片中不会同时出现飞机和马,仅出现一个分类。数据样例如下图所示:
Paddle自带了Cifar10数据集的下载接口:
# 下载数据
train_dataset = paddle.vision.datasets.Cifar10(mode='train', transform=ToTensor())
test_dataset = paddle.vision.datasets.Cifar10(mode='test', transform=ToTensor())
批量加载数据的方法如下:
# 加载数据
BATCH_SIZE = 32
train_loader = paddle.io.DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
test_loader = paddle.io.DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=True)
for batch_id, data in enumerate(train_loader()):
x_data = data[0]
y_data = data[1]
print(x_data.shape) # [32, 3, 32, 32],一批32张32*32的RGB三通道图片
print(y_data.shape) # [32],每张图片对应一个分类标签
break
预览数据涉及到张量转换为图片,张量可以理解为堆叠的矩阵,张量概念理解可以参考这里,将张量转换为图片方法为:
# 预览数据
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
# 设置标签对应名称
classes_name = {0:'airplane', 1:'automobile', 2:'bird', 3:'cat', 4:'deer', 5:'dog', 6:'frog', 7:'horse', 8:'ship', 9:'truck'}
for data in train_loader():
print(data[0][0])
data_array = paddle.Tensor.numpy(data[0][0])*255 # 从打印的数值可知像素值进行归一化,需要放大到0-255范围
print(data_array.shape) # (3, 32, 32)
data_array = np.stack([data_array[0], data_array[1], data_array[2]], axis=2) # 重排矩阵
print(data_array.shape) #维度修改为(32, 32, 3)以便转换为32*32的RGB图片
img = Image.fromarray(np.uint8(data_array), 'RGB')# numpy矩阵转换为图片
plt.figure()# 绘图
data_label = paddle.Tensor.numpy(data[1][0])[0]
plt.title("Image Label:%s" %(classes_name[data_label]))
plt.imshow(img)
plt.show()
break
输出为:
Tensor(shape=[3, 32, 32], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
[[[0.50980395, 0.50588238, 0.50980395, ..., 0.51372552, 0.50980395, 0.50588238],
[0.50588238, 0.50196081, 0.50588238, ..., 0.50980395, 0.50588238, 0.50196081],
[0.50980395, 0.50196081, 0.50980395, ..., 0.50980395, 0.50588238, 0.50196081],
...,
[0.29803923, 0.30196080, 0.30196080, ..., 0.34117648, 0.33725491, 0.32156864],
[0.28235295, 0.28235295, 0.28627452, ..., 0.42745101, 0.41568631, 0.31764707],
[0.29803923, 0.29803923, 0.30196080, ..., 0.33725491, 0.31764707, 0.28235295]],
[[0.68235296, 0.68235296, 0.68235296, ..., 0.68627453, 0.68627453, 0.68235296],
[0.67843139, 0.67450982, 0.67843139, ..., 0.68627453, 0.68235296, 0.67843139],
[0.68235296, 0.67450982, 0.68235296, ..., 0.68627453, 0.68235296, 0.67843139],
...,
[0.51764709, 0.52156866, 0.52156866, ..., 0.52549022, 0.52549022, 0.52156866],
[0.50196081, 0.50196081, 0.50588238, ..., 0.60000002, 0.59607846, 0.51764709],
[0.50980395, 0.50980395, 0.51372552, ..., 0.52549022, 0.50980395, 0.48627454]],
[[0.82352948, 0.81960791, 0.81960791, ..., 0.82745105, 0.81960791, 0.81176478],
[0.81568635, 0.81176478, 0.81568635, ..., 0.81568635, 0.81176478, 0.80784321],
[0.81960791, 0.81176478, 0.81960791, ..., 0.80784321, 0.80784321, 0.80784321],
...,
[0.58431375, 0.58431375, 0.58823532, ..., 0.58039218, 0.58431375, 0.59215689],
[0.56862748, 0.56862748, 0.57647061, ..., 0.59215689, 0.61568630, 0.59215689],
[0.58039218, 0.57647061, 0.58431375, ..., 0.55686277, 0.55686277, 0.55686277]]])
(3, 32, 32)
(32, 32, 3)
采用Paddle自带的VGG和ResNet模型:
# 构建VGG模型
from paddle.vision.models import VGG
from paddle.vision.models.vgg import make_layers
vgg11_cfg = [64, 'M', 128, 'M', 256, 'M', 512, 'M']
features = make_layers(vgg11_cfg)
model_VGG = VGG(features, num_classes=10) # 使用paddle自带的VGG模型
paddle.summary(model_VGG, (1,3,32,32))
# 构建ResNet模型
model_ResNet = paddle.vision.models.resnet18(pretrained=True, num_classes=10)
paddle.summary(model_ResNet, (1,3,32,32))
使用反向传播算法更新网络权重,用交叉熵评价分类误差,训练过程如下:
# 训练函数
import os
def train(model, epochs, train_loader, eval_loader, optim, metric_func, loss_func, model_name):
train_losses = []
eval_losses = []
eval_acces = []
for epoch in range(epochs):
""" train"""
model.train()
train_loss = 0
cnt = 0
for input, label in train_loader: # [n // B]
out = model(input)
loss = loss_func(out, label)
train_loss += loss
loss.backward()
optim.step()
optim.clear_grad()
cnt += 1
train_loss /= float(cnt) # 单个epoch的平均loss,用于可视化
train_losses.append(train_loss)
""" evaluation"""
model.eval()
eval_loss = 0
cnt = 0
acc = 0
with paddle.no_grad():
metric_func.reset()
for eval_x, eval_y in eval_loader: # n // B + 1 if n % B else 0
outs = model(eval_x)
loss = loss_func(outs, eval_y)
eval_loss += loss
correct = metric_func.compute(outs, eval_y)
metric_func.update(correct)
acc = metric_func.accumulate()
cnt += 1
eval_loss /= float(cnt)
eval_losses.append(eval_loss)
eval_acces.append(acc)
metric_func.reset()
print('---------epoch: %d, train_loss: %.3f, eval_loss: %.3f, eval_acc: %.3f-------' \
%(epoch, train_loss, eval_loss, acc))
# save
if acc >= max(eval_acces):
paddle.save(model.state_dict(), "model_"+model_name+".pdparams")
return model, train_losses, eval_losses, eval_acces
设置超参数后开始训练模型:
# 训练VGG模型
""" 训练相关超参数 """
epochs_VGG = 20
lr_VGG = 0.001
""" 优化方法和损失函数"""
optim_VGG = paddle.optimizer.Momentum(learning_rate=lr_VGG, parameters=model_VGG.parameters(), momentum=0.9)
loss_func_VGG = paddle.nn.loss.CrossEntropyLoss()
metric_func_VGG = paddle.metric.Accuracy()
""" 开始训练"""
model_VGG, train_losses_VGG, eval_losses_VGG, eval_acces_VGG = train(model_VGG, epochs_VGG, train_loader, test_loader, \
optim_VGG, metric_func_VGG, loss_func_VGG, 'VGG')
# 训练ResNet模型
""" 训练相关超参数 """
epochs_ResNet = 5
lr_ResNet = 0.001
""" 优化方法和损失函数"""
optim = paddle.optimizer.Momentum(learning_rate=lr_ResNet, parameters=model_ResNet.parameters(), momentum=0.9)
loss_func_ResNet = paddle.nn.loss.CrossEntropyLoss()
metric_func_ResNet = paddle.metric.Accuracy()
""" 开始训练"""
model_ResNet, train_losses_ResNet, eval_losses_ResNet, eval_acces_ResNet = train(model_ResNet, epochs_ResNet, train_loader, test_loader, \
optim_ResNet, metric_func_ResNet, loss_func_ResNet, 'ResNet')
使用训练的两个网络模型在测试集上做推理,判断分类结果:
# 模型推理
""" 加载VGG模型权重"""
infer_model_VGG = paddle.vision.models.VGG(features, num_classes=10)
state_dict_load_VGG = paddle.load('modelVGG.pdparams')
infer_model_VGG.set_state_dict(state_dict_load_VGG)
infer_model_VGG.eval()
""" 加载ResNet模型权重"""
infer_model_ResNet = paddle.vision.models.ResNet(BottleneckBlock, depth=50, num_classes=10)
state_dict_load_ResNet = paddle.load('modelResNet.pdparams')
infer_model_ResNet.set_state_dict(state_dict_load_ResNet)
infer_model_ResNet.eval()
# 设置标签对应名称
classes_name = {0:'airplane', 1:'automobile', 2:'bird', 3:'cat', 4:'deer', 5:'dog', 6:'frog', 7:'horse', 8:'ship', 9:'truck'}
for data in test_loader():
data_array = paddle.Tensor.numpy(data[0][0])*255 # 从打印的数值可知像素值进行归一化,需要放大到0-255范围
data_array = np.stack([data_array[0], data_array[1], data_array[2]], axis=2) # 重排矩阵
img = Image.fromarray(np.uint8(data_array), 'RGB')# numpy矩阵转换为图片
out_VGG = infer_model_VGG(data[0][0].reshape((1,3,32,32)))
out_VGG = paddle.argmax(out_VGG, axis=1).numpy()
out_ResNet = infer_model_ResNet(data[0][0].reshape((1,3,32,32)))
out_ResNet = paddle.argmax(out_ResNet, axis=1).numpy()
plt.figure()# 绘图
plt.title("VGG predict:%s\nResNet predict:%s" %(classes_name[out_VGG[0]], classes_name[out_ResNet[0]]))
plt.imshow(img)
break
输出结果为:
预览本项目运行的结果可以在百度AI Studio的图像分类项目NoteBook查看。