图像分类(Image Classification)是计算机视觉中的一个基础任务,将图像的语义将不同图像划分到不同类别。很多任务也可以转换为图像分类任务。比如人脸检测就是判断一个区域内是否有人脸,可以看作一个二分类的图像分类任务。
这里,我们使用的计算机视觉领域的经典数据集:CIFAR-10数据集,网络为ResNet18模型,损失函数为交叉熵损失,优化器为Adam优化器,评价指标为准确率。
CIFAR-10数据集包含了10种不同的类别、共60,000张图像,其中每个类别的图像都是6000张,图像大小均为32×32像素。CIFAR-10数据集的示例如图所示。
将数据集文件进行解压:
# 解压数据集
# 初次运行时将注释取消,以便解压文件
# 如果已经解压过,不需要运行此段代码,否则由于文件已经存在,解压时会报错
!tar -xvf /home/aistudio/data/data9154/cifar-10-python.tar.gz -C /home/aistudio/datasets/
在本实验中,将原始训练集拆分成了train_set、dev_set两个部分,分别包括40 000条和10 000条样本。将data_batch_1到data_batch_4作为训练集,data_batch_5作为验证集,test_batch作为测试集。
最终的数据集构成为:
读取一个batch数据的代码如下所示:
# coding=gbk
# 解压数据集
# 初次运行时将注释取消,以便解压文件
# 如果已经解压过,不需要运行此段代码,否则由于文件已经存在,解压时会报错
import os
import pickle
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.utils.data as data
import torchvision.transforms
from torchvision.transforms import Compose, Resize, Normalize,ToTensor
from torchvision.models import resnet18
import torch.nn.functional as F
import torch.optim as opt
from nndl import RunnerV3, Accuracy
from nndl import plot
def load_cifar10_batch(folder_path, batch_id=1, mode='train'):
if mode == 'test':
file_path = os.path.join(folder_path, 'test_batch')
else:
file_path = os.path.join(folder_path, 'data_batch_'+str(batch_id))
#加载数据集文件
with open(file_path, 'rb') as batch_file:
batch = pickle.load(batch_file, encoding = 'latin1')
imgs = batch['data'].reshape((len(batch['data']),3,32,32)) / 255.
labels = batch['labels']
return np.array(imgs, dtype='float32'), np.array(labels)
imgs_batch, labels_batch = load_cifar10_batch(folder_path='cifar-10-batches-py',
batch_id=1, mode='train')
查看数据的维度:
#打印一下每个batch中X和y的维度
print ("batch of imgs shape: ",imgs_batch.shape, "batch of labels shape: ", labels_batch.shape)
运行结果:
batch of imgs shape: (10000, 3, 32, 32) batch of labels shape: (10000,)
可视化观察其中的一张样本图像和对应的标签,代码如下所示:
image, label = imgs_batch[1], labels_batch[1]
print("The label in the picture is {}".format(label))
plt.figure(figsize=(2, 2))
plt.imshow(image.transpose(1,2,0))
plt.savefig('cnn-car.pdf')
plt.show()
运行结果:
The label in the picture is 9
构造一个CIFAR10Dataset类,其将继承自paddle.io.DataSet类,可以逐个数据进行处理。代码实现如下:
import torch
from torch.utils.data import Dataset,DataLoader
from torchvision.transforms import transforms
class CIFAR10Dataset(data.Dataset):
def __init__(self, folder_path='cifar-10-batches-py', mode='train'):
if mode == 'train':
#加载batch1-batch4作为训练集
self.imgs, self.labels = load_cifar10_batch(folder_path=folder_path, batch_id=1, mode='train')
for i in range(2, 5):
imgs_batch, labels_batch = load_cifar10_batch(folder_path=folder_path, batch_id=i, mode='train')
self.imgs, self.labels = np.concatenate([self.imgs, imgs_batch]), np.concatenate([self.labels, labels_batch])
elif mode == 'dev':
#加载batch5作为验证集
self.imgs, self.labels = load_cifar10_batch(folder_path=folder_path, batch_id=5, mode='dev')
elif mode == 'test':
#加载测试集
self.imgs, self.labels = load_cifar10_batch(folder_path=folder_path, mode='test')
self.transforms = Compose([ToTensor(),Normalize(mean=[0.4914, 0.4822, 0.4465],
std=[0.2023, 0.1994, 0.2010])])
def __getitem__(self, idx):
img, label = self.imgs[idx], self.labels[idx]
img=img.transpose(1, 2, 0)
img = self.transforms(img).cuda()
label=torch.tensor(label).cuda()
return img, label
def __len__(self):
return len(self.imgs)
torch.manual_seed(100)
train_dataset = CIFAR10Dataset(folder_path='cifar-10-batches-py', mode='train')
dev_dataset = CIFAR10Dataset(folder_path='cifar-10-batches-py', mode='dev')
test_dataset = CIFAR10Dataset(folder_path='cifar-10-batches-py', mode='test')
使用Resnet18进行图像分类实验。
from torchvision.models import resnet18
resnet18_model = resnet18()
**预训练模型:**首先,在一个原始任务上预先训练一个初始模型,然后在目标任务上使用该模型,针对目标任务的特性,对该初始模型进行精调,从而达到提高目标任务的目的。在本质上,这是一种迁移学习的方法,在自己的目标任务上使用别人训练好的模型。对于文本语言来说,是有天然的标注特征的存在的,原因就在于文本可以根据之前的输入词语进行预测,而且文本大多是有很多词语,所以就可以构成很大的预训练数据,进而可以自监督(不是无监督,因为词语学习过程是依据之前词语的输出的,所以应该是自监督学习)的预训练。
迁移学习: 迁移学习(Transfer Learning)是一种机器学习方法,就是把为任务 A 开发的模型作为初始点,重新使用在为任务 B 开发模型的过程中。迁移学习是通过从已学习的相关任务中转移知识来改进学习的新任务,虽然大多数机器学习算法都是为了解决单个任务而设计的,但是促进迁移学习的算法的开发是机器学习社区持续关注的话题。找到目标问题的相似性,迁移学习任务就是从相似性出发,将旧领域(domain)学习过的模型应用在新领域上。
resnet = models.resnet18(pretrained=True)
resnet = models.resnet18(pretrained=False)
复用RunnerV3类,实例化RunnerV3类,并传入训练配置。
使用训练集和验证集进行模型训练,共训练30个epoch。
在实验中,保存准确率最高的模型作为最佳模型。代码实现如下:
# 指定运行设备
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)
# 学习率大小
lr = 0.001
# 批次大小
batch_size = 64
# 加载数据
train_loader = data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
dev_loader = data.DataLoader(dev_dataset, batch_size=batch_size)
test_loader = data.DataLoader(test_dataset, batch_size=batch_size)
# 定义网络
model = resnet18_model
model = model.to(device)
# 定义优化器,这里使用Adam优化器以及l2正则化策略,相关内容在7.3.3.2和7.6.2中会进行详细介绍
optimizer = opt.Adam(lr=lr, params=model.parameters(), weight_decay=0.005)
# 定义损失函数
loss_fn = F.cross_entropy
# 定义评价指标
metric =Accuracy(is_logist=True)
# 实例化RunnerV3
strat_time = time.time()
runner = RunnerV3(model, optimizer, loss_fn, metric)
# 启动训练
log_steps = 1000
eval_steps = 1000
runner.train(train_loader, dev_loader, num_epochs=30, log_steps=log_steps,
eval_steps=eval_steps, save_path="best_model.pdparams")
运行结果:
[Train] epoch: 0/30, step: 0/18750, loss: 12.96725
[Train] epoch: 1/30, step: 1000/18750, loss: 1.03638
C:\Users\LENOVO\PycharmProjects\pythonProject\深度学习\nndl.py:386: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
batch_correct = torch.sum(torch.tensor(preds == labels, dtype=torch.float32)).cpu().numpy()
[Evaluate] dev score: 0.69020, dev loss: 0.94529
[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.69020
[Train] epoch: 3/30, step: 2000/18750, loss: 0.63199
[Evaluate] dev score: 0.67160, dev loss: 0.97103
[Train] epoch: 4/30, step: 3000/18750, loss: 0.99153
[Evaluate] dev score: 0.71030, dev loss: 0.84957
[Evaluate] best accuracy performence has been updated: 0.69020 --> 0.71030
[Train] epoch: 6/30, step: 4000/18750, loss: 0.66442
[Evaluate] dev score: 0.71620, dev loss: 0.85197
[Evaluate] best accuracy performence has been updated: 0.71030 --> 0.71620
[Train] epoch: 8/30, step: 5000/18750, loss: 0.60312
[Evaluate] dev score: 0.72310, dev loss: 0.82674
[Evaluate] best accuracy performence has been updated: 0.71620 --> 0.72310
[Train] epoch: 9/30, step: 6000/18750, loss: 0.87040
[Evaluate] dev score: 0.73230, dev loss: 0.79165
[Evaluate] best accuracy performence has been updated: 0.72310 --> 0.73230
[Train] epoch: 11/30, step: 7000/18750, loss: 0.39707
[Evaluate] dev score: 0.73730, dev loss: 0.79308
[Evaluate] best accuracy performence has been updated: 0.73230 --> 0.73730
[Train] epoch: 12/30, step: 8000/18750, loss: 0.94639
[Evaluate] dev score: 0.72790, dev loss: 0.79591
[Train] epoch: 14/30, step: 9000/18750, loss: 0.69579
[Evaluate] dev score: 0.72820, dev loss: 0.82343
[Train] epoch: 16/30, step: 10000/18750, loss: 0.51241
[Evaluate] dev score: 0.73740, dev loss: 0.79054
[Evaluate] best accuracy performence has been updated: 0.73730 --> 0.73740
[Train] epoch: 17/30, step: 11000/18750, loss: 0.52791
[Evaluate] dev score: 0.73660, dev loss: 0.79049
[Train] epoch: 19/30, step: 12000/18750, loss: 0.63937
[Evaluate] dev score: 0.74610, dev loss: 0.76617
[Evaluate] best accuracy performence has been updated: 0.73740 --> 0.74610
[Train] epoch: 20/30, step: 13000/18750, loss: 0.52228
[Evaluate] dev score: 0.72630, dev loss: 0.83306
[Train] epoch: 22/30, step: 14000/18750, loss: 0.46715
[Evaluate] dev score: 0.73620, dev loss: 0.80941
[Train] epoch: 24/30, step: 15000/18750, loss: 0.45405
[Evaluate] dev score: 0.73640, dev loss: 0.80397
[Train] epoch: 25/30, step: 16000/18750, loss: 0.70669
[Evaluate] dev score: 0.71390, dev loss: 0.87375
[Train] epoch: 27/30, step: 17000/18750, loss: 0.41202
[Evaluate] dev score: 0.74100, dev loss: 0.79830
[Train] epoch: 28/30, step: 18000/18750, loss: 0.60628
[Evaluate] dev score: 0.73470, dev loss: 0.80633
[Evaluate] dev score: 0.73510, dev loss: 0.81483
[Train] Training done!
可视化观察训练集与验证集的准确率及损失变化情况
plot(runner, fig_name='cnn-loss4.pdf')
运行结果:
在本实验中,使用了第7章中介绍的Adam优化器进行网络优化,如果使用SGD优化器,会造成过拟合的现象,在验证集上无法得到很好的收敛效果。
RunnerV3类:
class RunnerV3(object):
def __init__(self, model, optimizer, loss_fn, metric, **kwargs):
self.model = model
self.optimizer = optimizer
self.loss_fn = loss_fn
self.metric = metric # 只用于计算评价指标
# 记录训练过程中的评价指标变化情况
self.dev_scores = []
# 记录训练过程中的损失函数变化情况
self.train_epoch_losses = [] # 一个epoch记录一次loss
self.train_step_losses = [] # 一个step记录一次loss
self.dev_losses = []
# 记录全局最优指标
self.best_score = 0
def train(self, train_loader, dev_loader=None, **kwargs):
# 将模型切换为训练模式
self.model.train()
# 传入训练轮数,如果没有传入值则默认为0
num_epochs = kwargs.get("num_epochs", 0)
# 传入log打印频率,如果没有传入值则默认为100
log_steps = kwargs.get("log_steps", 100)
# 评价频率
eval_steps = kwargs.get("eval_steps", 0)
# 传入模型保存路径,如果没有传入值则默认为"best_model.pdparams"
save_path = kwargs.get("save_path", "best_model.pdparams")
custom_print_log = kwargs.get("custom_print_log", None)
# 训练总的步数
num_training_steps = num_epochs * len(train_loader)
if eval_steps:
if self.metric is None:
raise RuntimeError('Error: Metric can not be None!')
if dev_loader is None:
raise RuntimeError('Error: dev_loader can not be None!')
# 运行的step数目
global_step = 0
# 进行num_epochs轮训练
for epoch in range(num_epochs):
# 用于统计训练集的损失
total_loss = 0
for step, data in enumerate(train_loader):
X, y = data
# 获取模型预测
logits = self.model(X)
loss = self.loss_fn(logits, y.long()) # 默认求mean
total_loss += loss
# 训练过程中,每个step的loss进行保存
self.train_step_losses.append((global_step, loss.item()))
if log_steps and global_step % log_steps == 0:
print(
f"[Train] epoch: {epoch}/{num_epochs}, step: {global_step}/{num_training_steps}, loss: {loss.item():.5f}")
# 梯度反向传播,计算每个参数的梯度值
loss.backward()
if custom_print_log:
custom_print_log(self)
# 小批量梯度下降进行参数更新
self.optimizer.step()
# 梯度归零
self.optimizer.zero_grad()
# 判断是否需要评价
if eval_steps > 0 and global_step > 0 and \
(global_step % eval_steps == 0 or global_step == (num_training_steps - 1)):
dev_score, dev_loss = self.evaluate(dev_loader, global_step=global_step)
print(f"[Evaluate] dev score: {dev_score:.5f}, dev loss: {dev_loss:.5f}")
# 将模型切换为训练模式
self.model.train()
# 如果当前指标为最优指标,保存该模型
if dev_score > self.best_score:
self.save_model(save_path)
print(
f"[Evaluate] best accuracy performence has been updated: {self.best_score:.5f} --> {dev_score:.5f}")
self.best_score = dev_score
global_step += 1
# 当前epoch 训练loss累计值
trn_loss = (total_loss / len(train_loader)).item()
# epoch粒度的训练loss保存
self.train_epoch_losses.append(trn_loss)
print("[Train] Training done!")
# 模型评估阶段,使用'paddle.no_grad()'控制不计算和存储梯度
@torch.no_grad()
def evaluate(self, dev_loader, **kwargs):
assert self.metric is not None
# 将模型设置为评估模式
self.model.eval()
global_step = kwargs.get("global_step", -1)
# 用于统计训练集的损失
total_loss = 0
# 重置评价
self.metric.reset()
# 遍历验证集每个批次
for batch_id, data in enumerate(dev_loader):
X, y = data
# 计算模型输出
logits = self.model(X)
# 计算损失函数
loss = self.loss_fn(logits, y.long()).item()
# 累积损失
total_loss += loss
# 累积评价
self.metric.update(logits, y)
dev_loss = (total_loss / len(dev_loader))
dev_score = self.metric.accumulate()
# 记录验证集loss
if global_step != -1:
self.dev_losses.append((global_step, dev_loss))
self.dev_scores.append(dev_score)
return dev_score, dev_loss
# 模型评估阶段,使用'paddle.no_grad()'控制不计算和存储梯度
@torch.no_grad()
def predict(self, x, **kwargs):
# 将模型设置为评估模式
self.model.eval()
# 运行模型前向计算,得到预测值
logits = self.model(x)
return logits
def save_model(self, save_path):
torch.save(self.model.state_dict(), save_path)
def load_model(self, model_path):
state_dict = torch.load(model_path)
self.model.load_state_dict(state_dict)
Accuracy类:
class Accuracy():
def __init__(self, is_logist=True):
"""
输入:
- is_logist: outputs是logist还是激活后的值
"""
# 用于统计正确的样本个数
self.num_correct = 0
# 用于统计样本的总数
self.num_count = 0
self.is_logist = is_logist
def update(self, outputs, labels):
"""
输入:
- outputs: 预测值, shape=[N,class_num]
- labels: 标签值, shape=[N,1]
"""
# 判断是二分类任务还是多分类任务,shape[1]=1时为二分类任务,shape[1]>1时为多分类任务
if outputs.shape[1] == 1: # 二分类
outputs = torch.squeeze(outputs, dim=-1)
if self.is_logist:
# logist判断是否大于0
preds = torch.tensor((outputs >= 0), dtype=torch.float32)
else:
# 如果不是logist,判断每个概率值是否大于0.5,当大于0.5时,类别为1,否则类别为0
preds = torch.tensor((outputs >= 0.5), dtype=torch.float32)
else:
# 多分类时,使用'torch.argmax'计算最大元素索引作为类别
preds = torch.argmax(outputs, dim=1)
# 获取本批数据中预测正确的样本个数
labels = torch.squeeze(labels, dim=-1)
batch_correct = torch.sum(torch.tensor(preds == labels, dtype=torch.float32)).cpu().numpy()
batch_count = len(labels)
# 更新num_correct 和 num_count
self.num_correct += batch_correct
self.num_count += batch_count
def accumulate(self):
# 使用累计的数据,计算总的指标
if self.num_count == 0:
return 0
return self.num_correct / self.num_count
def reset(self):
# 重置正确的数目和总数
self.num_correct = 0
self.num_count = 0
def name(self):
return "Accuracy"
plot函数:
# 可视化
def plot(runner, fig_name):
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
train_items = runner.train_step_losses[::30]
train_steps = [x[0] for x in train_items]
train_losses = [x[1] for x in train_items]
plt.plot(train_steps, train_losses, color='#8E004D', label="Train loss")
if runner.dev_losses[0][0] != -1:
dev_steps = [x[0] for x in runner.dev_losses]
dev_losses = [x[1] for x in runner.dev_losses]
plt.plot(dev_steps, dev_losses, color='#E20079', linestyle='--', label="Dev loss")
# 绘制坐标轴和图例
plt.ylabel("loss", fontsize='x-large')
plt.xlabel("step", fontsize='x-large')
plt.legend(loc='upper right', fontsize='x-large')
plt.subplot(1, 2, 2)
# 绘制评价准确率变化曲线
if runner.dev_losses[0][0] != -1:
plt.plot(dev_steps, runner.dev_scores,
color='#E20079', linestyle="--", label="Dev accuracy")
else:
plt.plot(list(range(len(runner.dev_scores))), runner.dev_scores,
color='#E20079', linestyle="--", label="Dev accuracy")
# 绘制坐标轴和图例
plt.ylabel("score", fontsize='x-large')
plt.xlabel("step", fontsize='x-large')
plt.legend(loc='lower right', fontsize='x-large')
plt.savefig(fig_name)
plt.show()
使用测试数据对在训练过程中保存的最佳模型进行评价,观察模型在测试集上的准确率以及损失情况。代码实现如下:
# 加载最优模型
runner.load_model('best_model.pdparams')
# 模型评价
score, loss = runner.evaluate(test_loader)
print("[Test] accuracy/loss: {:.4f}/{:.4f}".format(score, loss))
运行结果:
[Test] accuracy/loss: 0.8730/0.5837
同样地,也可以使用保存好的模型,对测试集中的数据进行模型预测,观察模型效果,具体代码实现如下:
#获取测试集中的一个batch的数据
X, label = next(iter(test_loader))
logits = runner.predict(X)
#多分类,使用softmax计算预测概率
pred = F.softmax(logits)
#获取概率最大的类别
pred_class = torch.argmax(pred[2]).cpu().numpy()
print(pred_class)
print(label)
label = label[2].cpu().numpy()
#输出真实类别与预测类别
print("The true category is {} and the predicted category is {}".format(label, pred_class))
#可视化图片
plt.figure(figsize=(2, 2))
imgs, labels = load_cifar10_batch(folder_path='cifar-10-batches-py', mode='test')
plt.imshow(imgs[2].transpose(1,2,0))
plt.savefig('cnn-test-vis.pdf')
运行结果:
The true category is 8 and the predicted category is 8
resnet结构
这两种结构分别针对resnet34(左图)和resnet50/101/152(右图),一般称整个结构为一个”building block“。其中右图又称为”bottleneck design”,目的一目了然,就是为了降低参数的数目。
对于常规resnet,可以用于34层或者更少的网络中,对于bottleneck design的resnet通常用于更深的如101这样的网络中,目的是减少计算和参数量(实用目的)。
resnet提出了两种mapping:
identity mapping顾名思义,就是指本身,也就是公式中的x,而residual mapping指的是“差”,也就是y−x,所以残差指的就是F(x)部分。
对常规的网络(plain network,也称平原网络)直接堆叠很多层次,经对图像识别结果进行检验,训练集、测试集的误差结果如下图:
上面两个图可以看出,在网络很深的时候(56层相比20层),模型效果却越来越差了(误差率越高),并不是网络越深越好。
通过实验可以发现:随着网络层级的不断增加,模型精度不断得到提升,而当网络层级增加到一定的数目以后,训练精度和测试精度迅速下降,这说明当网络变得很深以后,深度网络就变得更加难以训练了。
下图是一个简单神经网络图,由输入层、隐含层、输出层构成:
ResNet引入了残差网络结构(residual network),通过这种残差网络结构,可以把网络层弄的很深(据说目前可以达到1000多层),并且最终的分类效果也非常好,残差网络的基本结构如下图所示,很明显,该图是带有跳跃结构的:
残差网络借鉴了高速网络(Highway Network)的跨层链接思想,但对其进行改进(残差项原本是带权值的,但ResNet用恒等映射代替之)。
LeNet
Lenet也称Lenet-5,共5个隐藏层(不考虑磁化层),网络结构为:
AlexNet
提出背景:解决Lenet识别大尺寸图片进行的效果不尽人意的问题
与LeNet相比,AlexNet具有更深的网络结构,共8个隐藏层,包含5层卷积和3层全连接,网络结构为:
同时使用了如下方法改进模型的训练过程:
VGGNet
提出背景:alexNet虽然效果好,但是没有给出深度神经网络的设计方向。即,如何把网络做到更深。
在论文中有VGG-11,VGG-13,VGG-16,VGG-19的实验比较,VGG-16的效果最佳,这里给出网络结构。
vggnet严格使用33小尺寸卷积和池化层构造深度CNN,取得较好的效果。小卷积能减少参数,方便堆叠卷积层来增加深度(加深了网络,减少了卷积)。即vggnet=更深的Alex net+conv(33)
GoogLeNet:
alexNet虽然效果好,但是没有给出深度神经网络的设计方向。即,如何把网络做到更深。
googlenet设计了inception结构来降低通道数,减少计算复杂度,其中inception结构包括以下几种
googlenet:Alex net+inception=conv1+inception9+FC*1
ResNet
alexNet虽然效果好,但是没有给出深度神经网络的设计方向。即,如何把网络做到更深。
Resnet从避免梯度消失或爆炸的角度,使用残差连接结构使网络可以更深,共5个版本
请问深度学习中预训练模型是指什么?如何得到?
迁移学习(Transfer)
resnet详解
Deep Residual Learning for Image Recognition(深度残差网络用于图像识别)理解
Le-Net、AlexNet、VggNet、googlenet、Resnet的发展与区别