[技术分享]使用VGG和ResNet神经网络对Cafar10图像分类

本文参考《深度学习工程师认证初级教程》中5.3.2节图像分类案例,利用VGGNet和ResNet实现在Cifar10数据集上的图像分类。

一、 数据集

Cifar10数据集是一个大的数据集的子集,包含了10类带标签的图片,图片包括:飞机、汽车、鸟、猫、狗、鹿、蛙、马、船、卡车等10类。每幅为32*32像素彩色图片,每类6000幅图片,总共60000幅图片,其中50000幅为训练数据,10000幅为测试数据。图片互相独立,也就是说在一副图片中不会同时出现飞机和马,仅出现一个分类。数据样例如下图所示:

[技术分享]使用VGG和ResNet神经网络对Cafar10图像分类_第1张图片

二、配置说明

1、 输入输出配置

Paddle自带了Cifar10数据集的下载接口:

# 下载数据
train_dataset = paddle.vision.datasets.Cifar10(mode='train', transform=ToTensor())
test_dataset =  paddle.vision.datasets.Cifar10(mode='test', transform=ToTensor())

批量加载数据的方法如下:

# 加载数据
BATCH_SIZE = 32

train_loader = paddle.io.DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
test_loader = paddle.io.DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=True)
for batch_id, data in enumerate(train_loader()):
    x_data = data[0]
    y_data = data[1]
    print(x_data.shape) # [32, 3, 32, 32],一批32张32*32的RGB三通道图片
    print(y_data.shape) # [32],每张图片对应一个分类标签
    break

预览数据涉及到张量转换为图片,张量可以理解为堆叠的矩阵,张量概念理解可以参考这里,将张量转换为图片方法为:

# 预览数据
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt

# 设置标签对应名称
classes_name = {0:'airplane', 1:'automobile', 2:'bird', 3:'cat', 4:'deer', 5:'dog', 6:'frog', 7:'horse', 8:'ship', 9:'truck'}

for data in train_loader():
    print(data[0][0])
    data_array = paddle.Tensor.numpy(data[0][0])*255 # 从打印的数值可知像素值进行归一化,需要放大到0-255范围
    print(data_array.shape)  # (3, 32, 32)
    data_array = np.stack([data_array[0], data_array[1], data_array[2]], axis=2) # 重排矩阵
    print(data_array.shape) #维度修改为(32, 32, 3)以便转换为32*32的RGB图片
    img = Image.fromarray(np.uint8(data_array), 'RGB')# numpy矩阵转换为图片
    plt.figure()# 绘图
    data_label = paddle.Tensor.numpy(data[1][0])[0]
    plt.title("Image Label:%s" %(classes_name[data_label]))
    plt.imshow(img)
    plt.show()
    break

输出为:

Tensor(shape=[3, 32, 32], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
       [[[0.50980395, 0.50588238, 0.50980395, ..., 0.51372552, 0.50980395, 0.50588238],
         [0.50588238, 0.50196081, 0.50588238, ..., 0.50980395, 0.50588238, 0.50196081],
         [0.50980395, 0.50196081, 0.50980395, ..., 0.50980395, 0.50588238, 0.50196081],
         ...,
         [0.29803923, 0.30196080, 0.30196080, ..., 0.34117648, 0.33725491, 0.32156864],
         [0.28235295, 0.28235295, 0.28627452, ..., 0.42745101, 0.41568631, 0.31764707],
         [0.29803923, 0.29803923, 0.30196080, ..., 0.33725491, 0.31764707, 0.28235295]],

        [[0.68235296, 0.68235296, 0.68235296, ..., 0.68627453, 0.68627453, 0.68235296],
         [0.67843139, 0.67450982, 0.67843139, ..., 0.68627453, 0.68235296, 0.67843139],
         [0.68235296, 0.67450982, 0.68235296, ..., 0.68627453, 0.68235296, 0.67843139],
         ...,
         [0.51764709, 0.52156866, 0.52156866, ..., 0.52549022, 0.52549022, 0.52156866],
         [0.50196081, 0.50196081, 0.50588238, ..., 0.60000002, 0.59607846, 0.51764709],
         [0.50980395, 0.50980395, 0.51372552, ..., 0.52549022, 0.50980395, 0.48627454]],

        [[0.82352948, 0.81960791, 0.81960791, ..., 0.82745105, 0.81960791, 0.81176478],
         [0.81568635, 0.81176478, 0.81568635, ..., 0.81568635, 0.81176478, 0.80784321],
         [0.81960791, 0.81176478, 0.81960791, ..., 0.80784321, 0.80784321, 0.80784321],
         ...,
         [0.58431375, 0.58431375, 0.58823532, ..., 0.58039218, 0.58431375, 0.59215689],
         [0.56862748, 0.56862748, 0.57647061, ..., 0.59215689, 0.61568630, 0.59215689],
         [0.58039218, 0.57647061, 0.58431375, ..., 0.55686277, 0.55686277, 0.55686277]]])
(3, 32, 32)
(32, 32, 3)

[技术分享]使用VGG和ResNet神经网络对Cafar10图像分类_第2张图片

2、 网络配置

采用Paddle自带的VGG和ResNet模型:

# 构建VGG模型
from paddle.vision.models import VGG
from paddle.vision.models.vgg import make_layers

vgg11_cfg = [64, 'M', 128, 'M', 256, 'M', 512, 'M']
features = make_layers(vgg11_cfg)
model_VGG = VGG(features, num_classes=10) # 使用paddle自带的VGG模型
paddle.summary(model_VGG, (1,3,32,32))


# 构建ResNet模型
model_ResNet = paddle.vision.models.resnet18(pretrained=True, num_classes=10)
paddle.summary(model_ResNet, (1,3,32,32))

三、训练模型

使用反向传播算法更新网络权重,用交叉熵评价分类误差,训练过程如下:

# 训练函数

import os

def train(model, epochs, train_loader, eval_loader, optim, metric_func, loss_func, model_name):
    train_losses = []
    eval_losses = []
    eval_acces = []

    for epoch in range(epochs):
        """ train"""    
        model.train() 
        train_loss = 0
        cnt = 0
        for input, label in train_loader:  # [n // B]
            out = model(input)
            loss = loss_func(out, label)
            train_loss += loss

            loss.backward()
            optim.step()
            optim.clear_grad()
            cnt += 1
            train_loss /= float(cnt)   # 单个epoch的平均loss,用于可视化
            train_losses.append(train_loss)

        """ evaluation"""
        model.eval()
        eval_loss = 0
        cnt = 0
        acc = 0
        with paddle.no_grad():
            metric_func.reset()
            for eval_x, eval_y in eval_loader: # n // B + 1 if n % B else 0
                outs = model(eval_x)
                loss = loss_func(outs, eval_y)
                eval_loss += loss

                correct = metric_func.compute(outs, eval_y)
                metric_func.update(correct)
                acc = metric_func.accumulate()
                cnt += 1
            	eval_loss /= float(cnt)
	            eval_losses.append(eval_loss)
    	        eval_acces.append(acc)
        	    metric_func.reset()
        
        	print('---------epoch: %d, train_loss: %.3f, eval_loss: %.3f, eval_acc: %.3f-------' \
                %(epoch, train_loss, eval_loss, acc))
        # save
        if acc >= max(eval_acces):
            paddle.save(model.state_dict(), "model_"+model_name+".pdparams")

    return model, train_losses, eval_losses, eval_acces

设置超参数后开始训练模型:

# 训练VGG模型

""" 训练相关超参数 """
epochs_VGG = 20
lr_VGG = 0.001

""" 优化方法和损失函数"""
optim_VGG = paddle.optimizer.Momentum(learning_rate=lr_VGG, parameters=model_VGG.parameters(), momentum=0.9)
loss_func_VGG = paddle.nn.loss.CrossEntropyLoss()
metric_func_VGG = paddle.metric.Accuracy()

""" 开始训练"""
model_VGG, train_losses_VGG, eval_losses_VGG, eval_acces_VGG = train(model_VGG, epochs_VGG, train_loader, test_loader, \
                optim_VGG, metric_func_VGG, loss_func_VGG, 'VGG')
# 训练ResNet模型

""" 训练相关超参数 """
epochs_ResNet = 5
lr_ResNet = 0.001

""" 优化方法和损失函数"""
optim = paddle.optimizer.Momentum(learning_rate=lr_ResNet, parameters=model_ResNet.parameters(), momentum=0.9)
loss_func_ResNet = paddle.nn.loss.CrossEntropyLoss()
metric_func_ResNet = paddle.metric.Accuracy()

""" 开始训练"""
model_ResNet, train_losses_ResNet, eval_losses_ResNet, eval_acces_ResNet = train(model_ResNet, epochs_ResNet, train_loader, test_loader, \
                optim_ResNet, metric_func_ResNet, loss_func_ResNet, 'ResNet')

四、模型推理

使用训练的两个网络模型在测试集上做推理,判断分类结果:

# 模型推理

""" 加载VGG模型权重"""
infer_model_VGG = paddle.vision.models.VGG(features, num_classes=10)
state_dict_load_VGG = paddle.load('modelVGG.pdparams')
infer_model_VGG.set_state_dict(state_dict_load_VGG)
infer_model_VGG.eval()

""" 加载ResNet模型权重"""
infer_model_ResNet = paddle.vision.models.ResNet(BottleneckBlock, depth=50, num_classes=10)
state_dict_load_ResNet = paddle.load('modelResNet.pdparams')
infer_model_ResNet.set_state_dict(state_dict_load_ResNet)
infer_model_ResNet.eval()

# 设置标签对应名称
classes_name = {0:'airplane', 1:'automobile', 2:'bird', 3:'cat', 4:'deer', 5:'dog', 6:'frog', 7:'horse', 8:'ship', 9:'truck'}

for data in test_loader():

    data_array = paddle.Tensor.numpy(data[0][0])*255 # 从打印的数值可知像素值进行归一化,需要放大到0-255范围
    data_array = np.stack([data_array[0], data_array[1], data_array[2]], axis=2) # 重排矩阵
    img = Image.fromarray(np.uint8(data_array), 'RGB')# numpy矩阵转换为图片

    out_VGG = infer_model_VGG(data[0][0].reshape((1,3,32,32)))
    out_VGG = paddle.argmax(out_VGG, axis=1).numpy()

    out_ResNet = infer_model_ResNet(data[0][0].reshape((1,3,32,32)))
    out_ResNet = paddle.argmax(out_ResNet, axis=1).numpy()

    plt.figure()# 绘图
    plt.title("VGG predict:%s\nResNet predict:%s" %(classes_name[out_VGG[0]], classes_name[out_ResNet[0]]))
    plt.imshow(img)
    break

输出结果为:

[技术分享]使用VGG和ResNet神经网络对Cafar10图像分类_第3张图片

预览本项目运行的结果可以在百度AI Studio的图像分类项目NoteBook查看。

你可能感兴趣的:(笔记,神经网络,pytorch,深度学习)