CIFAR-10是一个包含60000张图片的数据集。其中每张照片为32*32的彩色照片,每个像素点包括RGB三个数值
所有照片分属10个不同的类别,分别是airplane、automobile、bird、cat、deer、dog、frog、horse、ship、truck
其中五万张时训练集,一万张时测试集
http://www.cs.toronto.edu/~kriz/cifar.html
可以理解为数据处理格式的定义
Compose的作用就是把对图像处理的方法集中起来,使用的比较多的方法有:
中心化:CenterCrop()
转换成张量:ToTensor()
正则化:Normalize()
等等
注意Compose是会按照顺序进行处理的
##torchvision.datasets
训练集的处理和MNIST差不太多(包括CIFAR-100)
import torchvision.datasets as dset
dset.CIFAR10(root, train = True, transform = None, target_transform = None, download = False)
dset.CIFAR100(root, train = True, transform = None, target_transform = None, download = False)
因为先前弄过MNIST数据集,所以第一次先只改动数据集的处理部分,以及输入通道的问题,其余的先不动,代码如下
import torch
import torch.nn as nn
import torch.utils.data as data
import torchvision.datasets as datasets
import torchvision.transforms as transforms
num_epoch = 10
BATCH_SIZE = 50
transform = transforms.Compose(
[transforms.ToTensor()
]
)
#MNIST数据集加载
train_dataset = datasets.CIFAR10(
root= '/home/cxm-irene/PyTorch/data/cifar10',
train= True,
transform= transform,
download= False
)
train_loader = data.DataLoader(
dataset= train_dataset,
batch_size= BATCH_SIZE,
shuffle= True
)
test_dataset = datasets.CIFAR10(
root= '/home/cxm-irene/PyTorch/data/cifar10',
train= False,
transform= transform,
download= False
)
test_loader = data.DataLoader(
dataset= test_dataset,
batch_size= BATCH_SIZE,
shuffle= True
)
#搭建网络
class Net_CIFAR10(torch.nn.Module):
def __init__(self):
super(Net_CIFAR10, self).__init__()
self.conv1 = nn.Sequential(
nn.Conv2d(
in_channels= 3,
out_channels= 16,
kernel_size= 5,
stride= 1,
padding= 2,
),
nn.ReLU(),
nn.MaxPool2d(kernel_size= 2),
)
self.conv2 = nn.Sequential(
nn.Conv2d(
in_channels= 16,
out_channels= 32,
kernel_size= 5,
stride= 1,
padding= 2,
),
nn.ReLU(),
nn.MaxPool2d(kernel_size= 2),
)
self.predict = nn.Linear(32 * 8 * 8, 10)
def forward(self, x):
x = self.conv1(x)
x = self.conv2(x)
x = x.view(x.size(0), -1)
#x = self.hidden(x)
x = self.predict(x)
return x
hidden_layer = 200
#定义网络
net_cifar10 = Net_CIFAR10()
print(net_cifar10)
#进行优化
optimizer = torch.optim.RMSprop(net_cifar10.parameters(), lr = 0.005, alpha= 0.9)
#optimizer = torch.optim.Adam(net_mnist.parameters(), lr = 0.005, betas= (0.9, 0.99))
loss_function = nn.CrossEntropyLoss()
#开始训练
for epoch in range(num_epoch):
print('epoch = %d' % epoch)
for i, (batch_x, batch_y) in enumerate(train_loader):
x = net_cifar10(batch_x)
loss = loss_function(x, batch_y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if i % 100 == 0:
print('loss = %.5f' % loss)
#对测试集的评估
total = 0
correct = 0
net_cifar10.eval()
for batch_x, batch_y in test_loader:
x = net_cifar10(batch_x)
_, prediction = torch.max(x, 1)
total += batch_y.size(0)
correct += (prediction == batch_y).sum()
print('There are ' + str(correct.item()) + ' correct numbers.')
print('Accuracy=%.2f' % (100.00 * correct.item() / total))
两层卷积池化,然后十次迭代,优化器选择的是RMSprop,结果如下:
59.19%的正确率
其实可以看到迭代到后面,loss值基本上就不变了
##step2
增加了两层全连接层,优化器改成Adam,两层全连接层中加上标准化(先前试过正则化,结果loss一直卡在1.3到1.5之间就放弃了)
效果比之前好了大概百分之十左右,相比之前,可以看到的是loss值到后期降到1之下了,但很不稳定,并且一直在上下浮动
我怀疑是学习率的问题,导致它一直在最低点附近上下浮动,所以改小了学习率,降到0.001,并且把batch_size和epoch都改到了25,结果其实差不太多
参考博客:https://blog.csdn.net/dulingtingzi/article/details/79870486
有的时候网络层数增多了但是学习后的错误率反而提高了,但是按道理来说,模型的深度加深后,学习的能力会相应地增强,不应当产生比它更浅的模型更高的错误率。这个“退化”问题产生的原因归结于优化难题,当模型变得复杂时,优化器的优化会变得困难,导致了模型达不到好的学习效果
针对这个问题,提出了一个Residual结构:
也就是增加一个恒等映射,将原来需要学的函数H(x)转换成F(x)+x,两个函数表达的效果相同,但是在优化的难度上,后者会比前者简单
Residual block通过shortcut connection实现,通过shortcut将block的输入和输出进行加叠。残差网络借鉴了高速网络的跨层连接思想:
假定某段神经网络的输入是x,期望输出是H(x),如果已经学习到较饱和的准确率,那么接下来的学习目标就转变为恒等映射的学习,也就是使得输入x近似于输出H(x),以保持在后面的层次中不会造成精度的下降。在结构图中,shortcut connections直接把输入x传到输出作为初始结果,输出结果为H(x)=F(x)+x,当F(x)=0的时候H(x)=x,那么ResNet相当于改变了学习的目标,它学习的是目标值H(x)和x的差值,也就是所谓的残差F(x):=H(x)-x,因此,后面的训练目标就是将残差结果逼近于0,使得随着网络的加深准确率不下降
注:在shortcut connections中,如果通道相同,采用的计算方式为H(x)=F(x)+x,如果不同,采用的计算方式为H(x)=F(x)+Wx,其中W是卷积操作,用于调整x维度
作者通过实验证明了,这个加法不会给网络增加额外的参数和计算量,但是可以大大增加模型的训练速度、提高训练效果,并且当模型的层数加深的时候,能够解决退化问题
作者搭建了18、34、50、101、152层的ResNet分别进行了实验,结果发现随着层数的增多,错误率大大降低,计算的复杂程度也不高
ResNet18算是层数较少的网络了,结构如下:
这一部分的代码如下:
import torch
import torch.nn as nn
class ResidualBlock(nn.Module):
def __init__(self, inchannel, outchannel, stride=1):
super(ResidualBlock, self).__init__()
#这里的即为两个3*3 conv
self.left = nn.Sequential(
nn.Conv2d(inchannel, outchannel, kernel_size=3, stride=stride, padding=1, bias=False), #bias为偏置,False表示不添加偏置
nn.BatchNorm2d(outchannel),
nn.ReLU(),
nn.Conv2d(outchannel, outchannel, kernel_size=3, stride=1, padding=1, bias=False),
nn.BatchNorm2d(outchannel)
)
self.shortcut = nn.Sequential() #shortcut connections
if stride != 1 or inchannel != outchannel: #判断入通道和出通道是否一样,不一样的话进行卷积操作
self.shortcut = nn.Sequential(
nn.Conv2d(inchannel, outchannel, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(outchannel)
)
self.relu = nn.ReLU()
def forward(self, x):
out = self.left(x)
out += self.shortcut(x)
out = self.relu(out)
return out
class ResNet(nn.Module):
def __init__(self, ResidualBlock):
super(ResNet, self).__init__()
#图片处理,也就是白色方框内的3*3 conv
self.conv1 = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False),
nn.BatchNorm2d(64),
nn.ReLU(),
)
#中间的残差网络部分,与图上的结构一一对应
self.layer1 = self.make_layer(ResidualBlock, 64, 64, 2, stride=1)
self.layer2 = self.make_layer(ResidualBlock, 64, 128, 2, stride=2)
self.layer3 = self.make_layer(ResidualBlock, 128, 256, 2, stride=2)
self.layer4 = self.make_layer(ResidualBlock, 256, 512, 2, stride=2)
self.avg_pool2d = nn.AvgPool2d(4)
self.fc = nn.Linear(512, 10)
#相当于看处理几次,18的是每个处理两次
def make_layer(self, block, inchannel, outchannel, num_blocks, stride):
layers = []
for i in range(1, num_blocks):
layers.append(block(inchannel, outchannel, stride))
return nn.Sequential(*layers)
def forward(self, x):
out = self.conv1(x)
out = self.layer1(out)
out = self.layer2(out)
out = self.layer3(out)
out = self.layer4(out)
out = self.avg_pool2d(out)
out = out.view(out.size(0), -1)
out = self.fc(out)
return out
def ResNet_cifar():
return ResNet(ResidualBlock)
使用ResNet18的代码与以前的一样
import torch
import torchvision
import torch.nn as nn
import numpy as np
from resnet import ResNet_cifar
import torch.utils.data as data
import torchvision.datasets as datasets
import torchvision.transforms as transforms
num_epoch = 10
BATCH_SIZE = 20
transform = transforms.Compose(
[transforms.ToTensor()
]
)
#MNIST数据集加载
train_dataset = datasets.CIFAR10(
root= '/home/cxm-irene/PyTorch/data/cifar10',
train= True,
transform= transform,
download= False
)
train_loader = data.DataLoader(
dataset= train_dataset,
batch_size= BATCH_SIZE,
shuffle= True
)
test_dataset = datasets.CIFAR10(
root= '/home/cxm-irene/PyTorch/data/cifar10',
train= False,
transform= transform,
download= False
)
test_loader = data.DataLoader(
dataset= test_dataset,
batch_size= BATCH_SIZE,
shuffle= True
)
#定义网络
net = ResNet_cifar()
print(net)
#进行优化
#optimizer = torch.optim.RMSprop(net_cifar10.parameters(), lr = 0.005, alpha= 0.9)
optimizer = torch.optim.Adam(net.parameters(), lr = 0.001, betas= (0.9, 0.99))
loss_function = nn.CrossEntropyLoss()
for epoch in range(num_epoch):
print('epoch = %d' % epoch)
for i, (image, label) in enumerate(train_loader):
x = net(image)
loss = loss_function(x, label)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if i % 100 == 0:
print('loss = %.5f' % loss)
#对测试集的评估
total = 0
correct = 0
net.eval()
for image, label in test_loader:
x = net(image)
_, prediction = torch.max(x, 1)
total += label.size(0)
correct += (prediction == label).sum()
print('There are ' + str(correct.item()) + ' correct pictures.')
print('Accuracy=%.2f' % (100.00 * correct.item() / total))
因为cpu的原因,跑不了那么多,迭代十次,batch_size定为20就跑了八个小时左右,结果还可以
到最后的时候基本上loss值降到了0.1以下,最后的准确率达到了84.5%
ResNet的层数越多越好,但是因为电脑缘故就没再尝试
附上34层残差网络的结构图:
推荐一个博客,对resnet 的代码做了较为详细的解读:https://blog.csdn.net/jiangpeng59/article/details/79609392