LeNet是最早用于图像处理的神经网络,主要是为了解决手写数字识别的问题,著名的数据集Minist就是伴随着LeNet的诞生而出现的。下面是其基本架构:
其结构相对简单,其中的Pooling层可以使用MaxPooling,也可以使用AvgPooling,激活函数原始模型使用的是Sigmoid,不过也可以换成Relu,tanh等。
总结
代码实现
%matplotlib inline
import torch
from torch import nn
import torchvision
from torch.utils import data
from matplotlib import pyplot as plt
import numpy as np
trans = torchvision.transforms.ToTensor()
train_data = torchvision.datasets.FashionMNIST('../data/', train=True, download=False, transform=trans)
test_data = torchvision.datasets.FashionMNIST('../data/', train=False, download=False, transform=trans)
train_data.data.shape, test_data.data.shape
(torch.Size([60000, 28, 28]), torch.Size([10000, 28, 28]))
def get_dataloader(batch_size, train_data, test_data):
train_dataloader = data.DataLoader(train_data, batch_size=batch_size, shuffle=True)
test_dataloader = data.DataLoader(test_data, batch_size=batch_size, shuffle=False)
return train_dataloader, test_dataloader
class LeNet(nn.Module):
def __init__(self):
super(LeNet, self).__init__()
self.net = nn.Sequential(
nn.Conv2d(1, 6, kernel_size=5, padding=2),
nn.Sigmoid(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(6, 16, kernel_size=5),
nn.Sigmoid(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Flatten(),
nn.Linear(16 * 5 * 5, 120),
nn.Sigmoid(),
nn.Linear(120, 84),
nn.Sigmoid(),
nn.Linear(84, 10))
def forward(self, x):
x = x.view(-1, 1, 28, 28)
return self.net(x)
def get_optimizer(model, lr):
optimizer = torch.optim.SGD(model.parameters(), lr=lr)
return optimizer
# 定义评判方法,对于分类问题,我们常使用准确率来判定
def accuracy(y_hat, y):
padding = torch.argmax(y_hat, -1)
right = (padding == y).sum().numpy()
return right / y.shape[0]
def init_weights(m):
if type(m) == nn.Linear or type(m) == nn.Conv2d:
nn.init.xavier_uniform_(m.weight)
def train(epoches, batch_size, lr):
model = LeNet()
model.apply(init_weights)
loss = nn.CrossEntropyLoss()
optimizer = get_optimizer(model, lr)
train_loader, test_loader = get_dataloader(batch_size, train_data, test_data)
loss_lis = []
train_acc_lis = []
test_acc_lis = []
for epoch in range(epoches):
acc = 0
l_sum = 0
model.train()
for X, y in train_loader:
y_hat = model(X)
l = loss(y_hat, y)
optimizer.zero_grad()
l.backward()
optimizer.step()
acc += accuracy(y_hat, y)
l_sum += l.mean().detach().numpy()
acc = acc / (train_data.data.shape[0] / batch_size)
l_sum = l_sum / (train_data.data.shape[0] / batch_size)
model.eval()
acc_eval = 0
for x, Y in test_loader:
Y_hat = model(x)
acc_eval += accuracy(Y_hat, Y)
acc_eval /= (test_data.data.shape[0] / batch_size)
loss_lis.append(l_sum)
train_acc_lis.append(acc)
test_acc_lis.append(acc_eval)
print(f'epoch is {epoch + 1}, the loss is {l_sum} and the accuracy on train data is {acc}, on test data is{acc_eval}')
plt.plot(np.arange(1, epoches + 1), loss_lis, color='blue', label='loss')
plt.plot(np.arange(1, epoches + 1), train_acc_lis, color='grey', linestyle='--', label='train_acc')
plt.plot(np.arange(1, epoches + 1), test_acc_lis, color='red', linestyle='--', label='test_acc')
plt.grid()
plt.legend(loc='upper right')
plt.show()
train(10, 128, 0.9)
epoch is 1, the loss is 1.8979717952728272 and the accuracy on train data is 0.2590388888888889, on test data is0.6222
epoch is 2, the loss is 0.7648378531773885 and the accuracy on train data is 0.699338888888889, on test data is0.7426
epoch is 3, the loss is 0.5863379270553589 and the accuracy on train data is 0.7725944444444444, on test data is0.7907
epoch is 4, the loss is 0.4983555362065633 and the accuracy on train data is 0.8092, on test data is0.8301
epoch is 5, the loss is 0.4429962953249613 and the accuracy on train data is 0.8350277777777777, on test data is0.8435
epoch is 6, the loss is 0.4026765632947286 and the accuracy on train data is 0.8518888888888889, on test data is0.8468
epoch is 7, the loss is 0.37838911752700805 and the accuracy on train data is 0.8591833333333333, on test data is0.8668
epoch is 8, the loss is 0.35928820660909017 and the accuracy on train data is 0.8668888888888889, on test data is0.8727
epoch is 9, the loss is 0.34128332163492836 and the accuracy on train data is 0.8742888888888889, on test data is0.8638
epoch is 10, the loss is 0.32974659884770713 and the accuracy on train data is 0.8775555555555556, on test data is0.8824
AlexNet诞生于2012年,与另一种观察图像特征的提取方法不同,它认为特征本身应该被学习,在合理的复杂性前提下,特征应该由多个共同学习的神经网络层组成,每个层都有可学习的参数。在机器视觉中,最底层可能检测边缘、颜色和纹理;更高层建立在底层表示的基础上,以表示更大的特征,更高层可以检测整个物体;最终的隐藏神经元可以学习图像的综合表示,从而使不同类别的数据易于区分
总结
Alexnet虽然证明了深层神经网络是有效果的,但是它最大的问题是模型不规则,结构不是很清晰,没有提供一个通用的模板来指导后续的研究人员设计新的网络。如果模型想要变得更大、更深,则需要很好的设计思想,使得整个框架更加规则
如何使模型更大更深
VGG的核心思想是使用大量由一定数目的 3 × 3 3\times3 3×3的卷积层和一个最大池化层组成的VGG块进行堆叠,最终得到最后的网络
VGG块由两部分组成:多个填充为1的 3 × 3 3\times3 3×3卷积层(它有两个超参数:层数n、通道数m)和一个步幅为2的 2 × 2 2\times2 2×2最大池化层
VGG架构